jonathan-bravo / TELCoMB

GNU General Public License v3.0
1 stars 0 forks source link

Potential duplicates in colocalization results #1

Closed fermpeter closed 6 months ago

fermpeter commented 8 months ago

The colocalization_plot.svg seems to contain duplicates, defined as reads (rows) with a approximately the same length and the same ARGs/MGEs in the same locations within the reads. For an example, please see the attached SVG reads 5 and 6 (duplicates with plasmid 139561 and ARG MLS23S) and 7-9 (duplicates with ARG OXA and plasmid 155875).

We can supply the corresponding fastq file if needed. Thanks!

barcode0001_colocalizations_plot (1)

jonathan-bravo commented 8 months ago

Would be helpful to see a couple of files:

fermpeter commented 8 months ago

Yes, I can definitely provide a copies of these files! Thanks so much!

jonathan-bravo commented 7 months ago

I just did a check of the first two reads that look like duplicates.

psLayout version 3

match   mis-    rep.    N's Q gap   Q gap   T gap   T gap   strand  Q           Q       Q       Q   T           T       T       T   block   blockSizes  qStarts  tStarts
        match   match       count   bases   count   bases           name        size    start   end name        size    start   end count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
4521    0   0   0   0   0   0   0   +   00ee74cb-bc5b-4ab8-867f-64f2a1512499    4521    0   4521    00ee74cb-bc5b-4ab8-867f-64f2a1512499    4521    0   4521    1   4521,   0,  0,
1805    48  0   0   50  245 45  254 +   00ee74cb-bc5b-4ab8-867f-64f2a1512499    4521    836 2934    20b4b3ba-eb96-470a-876a-3a865e5af217    4458    799 2906    63  58,6,47,8,4,54,43,8,38,7,5,11,86,13,4,9,62,26,27,11,157,14,4,11,31,15,30,36,12,6,58,52,26,59,84,35,8,54,37,18,9,8,55,35,36,6,32,63,14,4,9,11,5,23,17,19,33,4,46,40,25,16,69,    836,900,906,954,982,987,1042,1085,1094,1132,1141,1147,1163,1251,1264,1269,1283,1351,1378,1414,1430,1598,1612,1633,1647,1679,1696,1747,1783,1795,1802,1865,1921,1949,2012,2098,2136,2145,2202,2240,2259,2270,2278,2333,2368,2405,2413,2456,2519,2534,2544,2554,2579,2585,2611,2628,2648,2699,2704,2766,2816,2842,2865,   799,863,870,917,943,947,1001,1045,1053,1092,1100,1105,1116,1205,1219,1223,1237,1301,1329,1363,1381,1557,1572,1592,1610,1641,1657,1706,1744,1757,1764,1828,1888,1914,1978,2063,2098,2107,2165,2204,2226,2235,2245,2302,2339,2375,2382,2426,2490,2504,2511,2520,2551,2556,2583,2603,2622,2675,2679,2743,2789,2816,2837,
1805    48  0   0   45  254 50  245 +   20b4b3ba-eb96-470a-876a-3a865e5af217    4458    799 2906    00ee74cb-bc5b-4ab8-867f-64f2a1512499    4521    836 2934    63  58,6,47,8,4,54,43,8,38,7,5,11,86,13,4,9,62,26,27,11,157,14,4,11,31,15,30,36,12,6,58,52,26,59,84,35,8,54,37,18,9,8,55,33,38,6,32,63,14,4,11,9,5,23,17,19,33,4,46,40,25,16,69,    799,863,870,917,943,947,1001,1045,1053,1092,1100,1105,1116,1205,1219,1223,1237,1301,1329,1363,1381,1557,1572,1592,1610,1641,1657,1706,1744,1757,1764,1828,1888,1914,1978,2063,2098,2107,2165,2204,2226,2235,2245,2302,2337,2375,2382,2426,2490,2504,2511,2522,2551,2556,2583,2603,2622,2675,2679,2743,2789,2816,2837,   836,900,906,954,982,987,1042,1085,1094,1132,1141,1147,1163,1251,1264,1269,1283,1351,1378,1414,1430,1598,1612,1633,1647,1679,1696,1747,1783,1795,1802,1865,1921,1949,2012,2098,2136,2145,2202,2240,2259,2270,2278,2333,2366,2405,2413,2456,2519,2534,2544,2556,2579,2585,2611,2628,2648,2699,2704,2766,2816,2842,2865,
4458    0   0   0   0   0   0   0   +   20b4b3ba-eb96-470a-876a-3a865e5af217    4458    0   4458    20b4b3ba-eb96-470a-876a-3a865e5af217    4458    0   4458    1   4458,   0,  0,

So these reads look like they aren't even a 50% match according to blat. They share 1805 bp, but because the reads are ~ 4500 bp each they are not marked as duplicate reads.