graph-genome / Schematize

Visualization component of Pangenome Schematics for 1,000s of individuals and gigabase genomes.
http://graphgenome.org
Apache License 2.0
10 stars 8 forks source link

Ambiguity Code Handling #44

Open josiahseaman opened 4 years ago

josiahseaman commented 4 years ago

Problem: SARS-CoV2 genomes already downloaded have ambiguity codes in them. Our pipeline needs to handle this in some way.

Simon Heumos: Sequences like 'Y' or 'K' are encoding a nucleotide, 'C, T or U' and 'G, T or U' respectively. For more detailed information see the FASTA specification https://en.wikipedia.org/wiki/FASTA_format. I also ran a short Python script to validate each FASTA entry. Key function was https://biopython.org/DIST/docs/api/Bio.Alphabet-module.html#_verify_alphabet. All entries passed the test.

Add the Python script for completeness. sars_filter_proteins.py

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
from Bio.Alphabet import _verify_alphabet
for record in SeqIO.parse("SARS-CoV-2.genbank.20200329.fasta", "fasta"):

If the script is run, no sequence identifier will be printed. Do you think this will be an issue for some downstream analysis like mapping or doing graph manipulation in vg? If so, we could mask all these ambivalent nucleotides with 'N'.

Josiah Seaman: My initial inclination is to leave the data the way it is. Those ambiguity codes are actual information after all. A far downstream feature would actually be unpacking those ambiguities into the matrix to see what is plausible based on extant variation in others, but that's low value at the moment. We also don't want to bloat the pangenome sequence with N's, almost better to leave those as a gap. ... With N's there's a new kind of coverage "sequence is known to be present, but we don't know what it is". Conclusion: preserve the ambiguity codes all the way up to FASTA chunks so that they can be visible in sequence download: https://github.com/graph-genome/Schematize/issues/36 In rendering, show ambiguity as gaps, no coverage.

Highest Priority: Pipeline doesn't crash or discard entire sample because of ambiguity codes.

6br commented 4 years ago

May I transfer this to the proper repository, e.g. component_segmentation or pipeline? I think this not affects the visualization itself.

josiahseaman commented 4 years ago

There's something important I've noticed about our dataset: Those "private insertion" nodes we've been hunting down contain ambiguity codes. We talked about these in April https://github.com/graph-genome/Schematize/issues/44 but currently they're being handled as completely novel sequence. That may be why ClustalW is giving different results; it would properly handle ambiguity codes as being a near trivial difference.

For example MT246470 starts with TTTATACCTTCSCCCC which separates it from others without ambiguities.

This requires some thought. We don't want to report a variant based only on an ambiguity code, which is what we're doing right now. But if we already had a variant nucleotide at that position, I'd be more inclinded to take an ambiguity seriously. One path for now would be to always make them N's, but that still causes the same problem of creating new nodes where they are not necessarily needed. The other option would be to set the sequence to match "consensus" where ambiguities are present. This would either need to be before seqwish, or in the alignment process itself. How does minimap handle ambiguities?

This is all I was able to find https://lh3.github.io/minimap2/minimap2.html --score-N INT Score of a mismatch involving ambiguous bases.

josiahseaman commented 4 years ago

Possible options:

image

josiahseaman commented 4 years ago

Identified problematic sequences

['http://collections.lugli.arvadosapi.com/c=00ef4c4427c0881a0030f7f400ce1ed0+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=1a191370cb868f80c824d93f9169599a+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=9e6fe32c3f7d281332ba958b5f62d109+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=bafb25a84fa5167d5a049fa43d607a44+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=9fe51f2847f3e8e3060c9ddebf3a41e5+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=d637278d9b95bbd1a5ef0bcd17a95c21+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=53fa57b401f3695feb0facf498f60871+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=392451211d0b7500ebaaa4e3182838be+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=bc7dcac01570c2fb81f16f76b98add9d+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=898c212f7a9d4984c382d782bad53fd4+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=f8001cec2144c59cbd851706b898ddfe+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=71063763aabd91e0b33d6861294bdff6+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=57dca4995c2186b11b67ab1cff0b005b+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=f95a298c57718bf290d9facdda59eb66+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=71da768110cd21ff99f5664bc335a4ec+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=06f5726c45483d0e8fdea3004f2c4adf+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=f9cea932bff8e83a2cb490c3bd694742+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=5914683bbe1ff047a163b3e57110f11b+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=27bb9a654a5f46e08888f55021d37b17+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=a9be2d60f66fd03a75418b40306ededc+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=aa1d1c497dabed0589c8ea6423179441+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=c6f8550cf6940591fea7de5f2159d88b+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=ab9c2241bda0599d20877ece1e1bc04e+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=5caa10de623c2384a31160c72a8f4f9c+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=0f24420528d58bff3468084aca3d7328+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=4887cadadce95997fed59d129e47b47b+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=e8e00929537a550b0989be12147d6241+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=7ebbc05a6949a6ce0637fa692af183ad+126/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=6566c86da5313159640092f16ac8a0cb+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=d04a38579335168796dd8d25f362ff8f+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=810d1e1012cbc4f63226159bd8b1fa08+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=4d40985616d6975a41a117c41fd38145+123/sequence.fasta', 'http://collections.lugli.arvadosapi.com/c=d2062c46515c5fffed7d27b95a9e32c9+126/sequence.fasta']

00ef4c4427c0881a0030f7f400ce1ed0+123/sequence.fasta
1a191370cb868f80c824d93f9169599a+126/sequence.fasta
9e6fe32c3f7d281332ba958b5f62d109+123/sequence.fasta
bafb25a84fa5167d5a049fa43d607a44+126/sequence.fasta
9fe51f2847f3e8e3060c9ddebf3a41e5+123/sequence.fasta
d637278d9b95bbd1a5ef0bcd17a95c21+123/sequence.fasta
53fa57b401f3695feb0facf498f60871+123/sequence.fasta
392451211d0b7500ebaaa4e3182838be+123/sequence.fasta
bc7dcac01570c2fb81f16f76b98add9d+126/sequence.fasta
898c212f7a9d4984c382d782bad53fd4+123/sequence.fasta
f8001cec2144c59cbd851706b898ddfe+123/sequence.fasta
71063763aabd91e0b33d6861294bdff6+123/sequence.fasta
57dca4995c2186b11b67ab1cff0b005b+126/sequence.fasta
f95a298c57718bf290d9facdda59eb66+123/sequence.fasta
71da768110cd21ff99f5664bc335a4ec+126/sequence.fasta
06f5726c45483d0e8fdea3004f2c4adf+123/sequence.fasta
f9cea932bff8e83a2cb490c3bd694742+123/sequence.fasta
5914683bbe1ff047a163b3e57110f11b+126/sequence.fasta
27bb9a654a5f46e08888f55021d37b17+126/sequence.fasta
a9be2d60f66fd03a75418b40306ededc+126/sequence.fasta
aa1d1c497dabed0589c8ea6423179441+123/sequence.fasta
c6f8550cf6940591fea7de5f2159d88b+123/sequence.fasta
ab9c2241bda0599d20877ece1e1bc04e+126/sequence.fasta
5caa10de623c2384a31160c72a8f4f9c+126/sequence.fasta
0f24420528d58bff3468084aca3d7328+123/sequence.fasta
4887cadadce95997fed59d129e47b47b+126/sequence.fasta
e8e00929537a550b0989be12147d6241+126/sequence.fasta
7ebbc05a6949a6ce0637fa692af183ad+126/sequence.fasta
6566c86da5313159640092f16ac8a0cb+123/sequence.fasta
d04a38579335168796dd8d25f362ff8f+123/sequence.fasta
810d1e1012cbc4f63226159bd8b1fa08+123/sequence.fasta
4d40985616d6975a41a117c41fd38145+123/sequence.fasta
d2062c46515c5fffed7d27b95a9e32c9+126/sequence.fasta

Complete path list from SARS_clean2.w1.json:

['http://collections.lugli.arvadosapi.com/c=043cb3a3da73556296b4ac0e08700e95+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=c324b17273b04ca05891c58c8eac0642+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=ac45d6fcec4e0aa88822ed3c09688992+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=08d54b8a0863384ffc0d0bda17d8eb04+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=09add9e2832c0d8a69f949634eccba9b+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=835f0d0ac434ee4cc8c6498344dce7bf+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=b0d14242fcf2b93eb58205718ae678c9+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=3cc742bbb66887d052c4d27b3e8054fb+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=4cd9c45869a2631a4abc1b72d909d207+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=b1e6537a55a788d3587878840bce8434+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=421283dcdf8b6889eb41167f9c0e8ae5+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=47368329dd2e3821c96108c26bf816f2+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=1c749502e367221416506d8a98c62931+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=39a7ae68a0f5c13e92a25b9d5e056e12+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=34f188cba92a4bcf5131fa52ce907b50+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=9ab9fdaf6577dfaf2f1f039bf57da8b3+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=cced4c887fc42181897fab63e3f9a760+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=8e66782f70d63263d53da59e05b28fc2+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=1fd3b93003d4a334de31b516aca77b6e+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=88fa465ae7e89c55563ceecd7333f30c+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=52be92fd0ac4bcf820604b12b158a585+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=2a7ea91c087ba7864be50b002e59cb1d+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=cf3180bcd31aab06b152d85573f40526+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=4198a0afca1551d5ff5eff252c96cb26+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=ccf3ff94e2220e904574978dc8802d3d+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=354538e5bad96d8bed24a532908c2971+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=bd65c85cd51ad6c66a3744b03acfa36b+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=5848fd13498f07f7f560056809549a65+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=5ee2f22022ec81151e973dcb340ab486+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=78e42d70ffbfd4704b9e42e299ab94e1+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=24213905a0ea7459adbb8c336508da0a+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=bc65355f3396de3389c686d35efe767b+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=3364ca069c24a05f2dff40eaae787d27+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=efed764c304c7203283591ce5a4e9428+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=23b662208f113497f7f72b22b249751d+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=043edb7de25ccf2127392766abff2ae5+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=0685da8ede4843921a88f23c8e6e0727+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=736a477d6afa8e5c9b629c9d5111ed2b+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=bd2c53fc5f5efa659262899eb4238c60+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=62a0b4d5d34480e2ca84c2debc92aebf+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=46ddd9858b46f96f91799d8d8ad4b348+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=34a8cfa1c8617b9ca4cc3356b34efb9b+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=e242adb0b77b030824f45513536e6570+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=d1f7b781167759d8d323f7aa62fa9b1b+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=e88235e63dc39d190ce7520a021c0929+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=3db963bd5674284b445a4316262ddab1+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=11b957f68922a5b756a37a4f89e0abc8+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=c9a2e80cab6b7eb8c75b60f9822182bd+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=75ba5748d88d1c391b440f2e60ffbf9f+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=07a3fdcec07c519d3fc50bb88399b472+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=7ff02ba3adfe35ad195d8e29bf3bec6e+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=c3415c183428203fc997ca66ff1b2f2b+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=5fca64c7a334fc2e6be025d2b86534ae+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=21860556c33028614687b9f64dbc35e0+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=f5f18cfd65253940d5ce632a034d739e+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=6d6a004bff574afe6778e85a48d22088+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=e1798eabe9c1be3db23617553d805d55+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=3bf45fd656aee9fadc1b691186ab9393+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=0347330c2a6f7590897f37ddb3c9139e+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=85930158a522db7eb196bf1e31f4527a+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=2ce0024aa6de9908742630cfbb38dca3+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=2d551977a7f56e5195c4b13b1f388679+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=24500c2c81f2a1f14c4e5c45b4b87a40+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=3ebc053a952d5e08d727c1342f9f153f+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=d52faa69aa363885c5da732c107b1520+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=d6b10a4d9664fc641d4f446953183836+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=8fe2319d28f4669f76ca0f84dd9f96e5+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=49c95349619c761219dc7f4e86898179+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=38136f4bcc785d9a9a48d7743aa78d29+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=574e138fd13ca8e9a23d01755f8117d3+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=af000a5943cae2851ad555e97c21bacc+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=67a86bf050de1417bad6ceb9a33c699f+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=4a1003813ee8182bee06a96808d5d48b+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=8d70942ab1f0f0b2f73e74f20486149b+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=894de73fed7587816f5786690e0b3b1f+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=89425de0ef9fe85480a9d0d5a99fb6ee+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=6d554434ef2c73704670569abfb09d75+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=32d3374b4ea65022a6cff41fd9df434f+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=29b789985f6c6b98bf2340a0d966b29b+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=eff278853b3e4214a33f3dea13e429c8+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=d5ae742a26ebe53a636e3f09c0ca1198+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=b4acee3829965946fcfc83013641354d+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=95d3b8d2827213ac3eed48bf15d77aeb+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=8a0f62a6fbf3b021648a08f50b2a8a26+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=7e8691c7252915438887d9afa2a0ebe7+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=97bc33aaa23c79a186a264102d0798f1+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=8bb3b20ae4c06f034f3043b8332a4626+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=e3b6041385b2770e8c340955cfca0f19+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=51a7a6d5a3f1e9007722c111e93534fa+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=8b7892e9bdc53a70dea28708de144e03+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=1203fd65ddecf347276c4674e5ffec60+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=fd8b9483de22df4cb368957763b1be9d+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=53c01573cd1b3980aa581be11284a202+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=6e1cda78ac96a151c510864504f2da1b+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=69158b8869cdff20e408029dbdc0a02c+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=a6ded77905a7eb8c7619fa04a4857398+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=89cc0cea88914477a7fff2d18d1ef562+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=bb16500c59e1636da18b12cd804cdf2f+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=f0f9f3816442c7d12960fb087648ed98+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=6de24fbc080c455ad15a9842528c7169+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=75a77c5d621dbb34623e1252009e9b4a+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=a564ad9210fc1a886a191a22b37f8694+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=d14435e768dbf4ade17187a01de16765+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=973decd2af5f3d460573f95afe7c9602+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=ece242c0e573c8bf95d5dc7e8401f086+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=0147e32602f0fce4eda532e477c94e0a+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=2bc94a71ea6415e2a4e6b216aa298f76+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=922cb8dd3e30ca2492036d88fd4fe140+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=67cdac863b6caec6a21ad037280f233e+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=23be6c8d4dd9478d168723602e93b8b7+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=fccca54265a62a9234bdbc225136c348+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=5eeb1ffe9bd62e4021556fcba5c11dbb+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=27f5c3c46387d0ec05e28daae3b36a42+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=d6fd44a6cb2e84a9ebc143ba93409135+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=f37c5a17f7a6f63bbe7d271dbdd369bb+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=6262fe8e7c5711d6274d34f9a18ab788+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=140cbade366a3cb7d20db44ec8606b9c+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=1380953bca5441df8daa0923357ae440+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=c2df1daaf06f2ee03ebafe2d6efee3b3+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=6e886df062fcf6de45abca3bf9e353d6+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=137bf7ffd110317c79e2bd48796803d7+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=5d1116ae9759a8036589b4c1f23a7898+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=a02c7d25cc990e45dee8daf7a3ce09ba+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=8e6d52a8cf0219c4b1f8c5d4e59c5d39+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=c06f3d3781758ff2cb4087019aa2428e+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=988d613f03840b312336c41afd42b97f+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=ae0d4f6974d043e4b4fd30d6fb6406f0+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=6c15424572f6f1e5c77701993c627c3d+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=80946cc74d6f6ba751a8d7950d7b1c75+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=f3532dadf11af43cec8f5e570db12601+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=e25c5e309bcb8271735f571b1b354f32+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=34a24260528b2bb8d4957bcc62b59ea5+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=91bfb9ac6dce8dd4c406758decb70656+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=59172eea057dd24b550d969e1f4aac5f+126/sequence.fasta']

Last 6 Pathnames to be excluded:

['http://collections.lugli.arvadosapi.com/c=80946cc74d6f6ba751a8d7950d7b1c75+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=f3532dadf11af43cec8f5e570db12601+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=e25c5e309bcb8271735f571b1b354f32+126/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=34a24260528b2bb8d4957bcc62b59ea5+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=91bfb9ac6dce8dd4c406758decb70656+123/sequence.fasta',
 'http://collections.lugli.arvadosapi.com/c=59172eea057dd24b550d969e1f4aac5f+126/sequence.fasta']