Open tseemann opened 6 years ago
Thanks for the report!
Hi @tseemann, finally having some time to look at this. I'm beginning to feel like this works as intended ™️. If there are multiple assemblies for a strain, the strain dir will have multiple files. The way I understand your report is that this isn't what you are expecting.
From your perspective, what is the benefit of having two strain_assembly_id
folders, rather than two files in a strain
folder?
These are not different assemblies of the same thing though?
https://www.ncbi.nlm.nih.gov/assembly/GCF_000011625.1/ https://www.ncbi.nlm.nih.gov/assembly/GCF_000833045.1/
The are different biosamples. They just happen to have the same "strain" name but this is not an enforced unique field. Could have came originally from same freezer stock, but been passaged? Some labs use such generic strain IDs that clashes happen all the time.
The
strain
column isn't unique. Might need to detect this, and appened theGCF_
number to the strain to discriminate?