Closed nsheff closed 3 years ago
Confirmed. When I wrap them both at the same column width, they give the same checksum:
head -n 1 z1.fa > z1_wrap.fa; cat z1 | sed 1d | tr -d '\n' | fold -w 50 -s > z1_wrap.fa
head -n 1 z2.fa > z2_wrap.fa; cat z1 | sed 1d | tr -d '\n' | fold -w 50 -s > z2_wrap.fa
md5sum z1_wrap.fa
6e57c10072f0b6bed3460b17ef2c9b87 z1_wrap.fa
md5sum z2_wrap.fa
6e57c10072f0b6bed3460b17ef2c9b87 z2_wrap.fa
Which should we remove?
Thanks for spotting this. The v1_0 is the correct version for the genome assembly, the 2_1 is just a reannotation (https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Alyrata) So, v1_0 stays. I'm taking notes to make sure I make the correct link when adding the annotations.
Here are 2 more duplicates:
Capsella_rubella_JGI_annotation_v1_0_on_assembly_v1 Capsella_rubella__JGI_v1_0
Ostreococcus_lucimarinus_JGI_2_0 Ostreococcus_lucimarinus_JGI_v2_0_assembly_and_annotation
Which should I remove?
Please keep Capsella_rubella__JGI_v1_0 and Ostreococcus_lucimarinus_JGI_2_0
thanks
Also: Musa_acuminata_Banana_Genome_v1_0 == Musa_acuminata_Genescope-Cirad
Please keep: Musa_acuminata_Banana_Genome_v1_0
Related to #8
I think these two are identical sequences, with different wrapping:
Arabidopsis_lyrata_JGI_v1_0-fasta-fasta Arabidopsis_lyrata__JGI_v2_1-fasta-fasta
@ieguinoa how do you want to proceed?