Closed fermpeter closed 6 months ago
Would be helpful to see a couple of files:
Yes, I can definitely provide a copies of these files! Thanks so much!
I just did a check of the first two reads that look like duplicates.
psLayout version 3
match mis- rep. N's Q gap Q gap T gap T gap strand Q Q Q Q T T T T block blockSizes qStarts tStarts
match match count bases count bases name size start end name size start end count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
4521 0 0 0 0 0 0 0 + 00ee74cb-bc5b-4ab8-867f-64f2a1512499 4521 0 4521 00ee74cb-bc5b-4ab8-867f-64f2a1512499 4521 0 4521 1 4521, 0, 0,
1805 48 0 0 50 245 45 254 + 00ee74cb-bc5b-4ab8-867f-64f2a1512499 4521 836 2934 20b4b3ba-eb96-470a-876a-3a865e5af217 4458 799 2906 63 58,6,47,8,4,54,43,8,38,7,5,11,86,13,4,9,62,26,27,11,157,14,4,11,31,15,30,36,12,6,58,52,26,59,84,35,8,54,37,18,9,8,55,35,36,6,32,63,14,4,9,11,5,23,17,19,33,4,46,40,25,16,69, 836,900,906,954,982,987,1042,1085,1094,1132,1141,1147,1163,1251,1264,1269,1283,1351,1378,1414,1430,1598,1612,1633,1647,1679,1696,1747,1783,1795,1802,1865,1921,1949,2012,2098,2136,2145,2202,2240,2259,2270,2278,2333,2368,2405,2413,2456,2519,2534,2544,2554,2579,2585,2611,2628,2648,2699,2704,2766,2816,2842,2865, 799,863,870,917,943,947,1001,1045,1053,1092,1100,1105,1116,1205,1219,1223,1237,1301,1329,1363,1381,1557,1572,1592,1610,1641,1657,1706,1744,1757,1764,1828,1888,1914,1978,2063,2098,2107,2165,2204,2226,2235,2245,2302,2339,2375,2382,2426,2490,2504,2511,2520,2551,2556,2583,2603,2622,2675,2679,2743,2789,2816,2837,
1805 48 0 0 45 254 50 245 + 20b4b3ba-eb96-470a-876a-3a865e5af217 4458 799 2906 00ee74cb-bc5b-4ab8-867f-64f2a1512499 4521 836 2934 63 58,6,47,8,4,54,43,8,38,7,5,11,86,13,4,9,62,26,27,11,157,14,4,11,31,15,30,36,12,6,58,52,26,59,84,35,8,54,37,18,9,8,55,33,38,6,32,63,14,4,11,9,5,23,17,19,33,4,46,40,25,16,69, 799,863,870,917,943,947,1001,1045,1053,1092,1100,1105,1116,1205,1219,1223,1237,1301,1329,1363,1381,1557,1572,1592,1610,1641,1657,1706,1744,1757,1764,1828,1888,1914,1978,2063,2098,2107,2165,2204,2226,2235,2245,2302,2337,2375,2382,2426,2490,2504,2511,2522,2551,2556,2583,2603,2622,2675,2679,2743,2789,2816,2837, 836,900,906,954,982,987,1042,1085,1094,1132,1141,1147,1163,1251,1264,1269,1283,1351,1378,1414,1430,1598,1612,1633,1647,1679,1696,1747,1783,1795,1802,1865,1921,1949,2012,2098,2136,2145,2202,2240,2259,2270,2278,2333,2366,2405,2413,2456,2519,2534,2544,2556,2579,2585,2611,2628,2648,2699,2704,2766,2816,2842,2865,
4458 0 0 0 0 0 0 0 + 20b4b3ba-eb96-470a-876a-3a865e5af217 4458 0 4458 20b4b3ba-eb96-470a-876a-3a865e5af217 4458 0 4458 1 4458, 0, 0,
So these reads look like they aren't even a 50% match according to blat. They share 1805 bp, but because the reads are ~ 4500 bp each they are not marked as duplicate reads.
The colocalization_plot.svg seems to contain duplicates, defined as reads (rows) with a approximately the same length and the same ARGs/MGEs in the same locations within the reads. For an example, please see the attached SVG reads 5 and 6 (duplicates with plasmid 139561 and ARG MLS23S) and 7-9 (duplicates with ARG OXA and plasmid 155875).
We can supply the corresponding fastq file if needed. Thanks!