ncbi / sra-tools

SRA Tools
Other
1.11k stars 243 forks source link

Is it common for long-read samples to have different headers and orders, even though all reads are present? #913

Closed AnupamGautam closed 6 months ago

AnupamGautam commented 6 months ago

Dear Developers,

I have downloaded the same sample from both SRA and ENA, but I noticed that the order and headers of the reads appear to be different (For short reads samples it seems to be fine). For example, the first line of the fastq file from ENA is:

@ERR5029616.1 00000060-95e4-4ee4-85ee-a9dcd2506e0e
GTTGTAGCACCCGCGAGCGCGCCCAGTAGATGTCGGTGCCGTCCTCGAAGATGACATACGAACGACACCCCAAAATTGGATAGTCCGCGCACTACCTTGGATTTCGGCACGCTCAGCATGGCGGTGGTGGCGAGCGGATAGGTGACTTGGTCCTCGACCACCTGCGGCGCCTGCCCGGGGTACTCCGTATAGATGATGACTGGACGTCGGACAGATCCGGAATCGCGTCGATCCGGCGTCTTCCATACCGAATATATGCCCCCGATTACGACGAGAGAATGTTCCCAGCAGCACCAGGAAGACATGCCGCACCGACCATTCTATGATGTTGTTGAGCATGTCAGTGCCTCCGTGGCCGCTGGCGGCAGAGCTGGCGGCCACCGGTTGGATGCGCGCGATGACGTACTCGCCGGCTGACTCCTCGATTAGTTCGAAGTCGACCTTCTGACCGGGTTTCAACGATCGAAGCAATGCCGGGTCCAAAACCCGGAAATCCATGATCATCGCGGGCCACTGAAGGCTTGCGATCGGACCATGGGCAAGCGTTACCGTCACATGGGCCAGATCGATCACTATGTTGCCTCCCCGTGGTGATTTGCGAGCGCGGAGTGCGGCAGACGCGCCGGGCTTGACCTGTTTCTTCCCGTCCGGCTTGGAACTGTATGGGCGCTGTGGCCGCCGCTGACAATACGTAACTT
+
++,($+3:>EA@2/0137<1;;>C(>=>>C00/=47?9?215794><864:4:310-+/11C?CA=?GCDABC5222JEB08+79A7)))8@3115559;9AE?G05>?;>H=@H=?B?@E<<BADA,,,(**<;&=:33>:5:3)(%22&&+&@DF??>IGHH55>?CD622/))---..48B4>;HDABCC?=/(A(7,<<BE96B=>B@ADA++CEDD@;8/@;9<66--86<E<><857<9779122=;:89::<;547?GBD?>?E--$%73:;>@EA<;;22,,=9;=:5DF>>:9;<<=?<5,,:5<045.=<<+A>BD44C>A::AB>;;%%,=+49/,**0/*'&&0/(.66:-.0,-.)-.)159:??@8HE89GAE@<6633%%8AA;?8/0.-24889:7>>9;<=><=?7=../,;,?;8@@>=@ADBBE@F:?>EE>8CED,;&'06.)=1684;>?DEIEFD1%%:0155BA@CC7;<44<(0774>@BA99D8CC?0/020/9-35428:A>?CC?EB9?C?>CEF5:<7;0+0435#'(000?;555DB?>=<B+15B;<K872117;:*:=9>=G?N@G*741=33134%5??<>D@FF?@G9CHB*&6**+9:9787;<=?KB89-477F=A--,03-.0:/..001;<@=<40/+)*;B@2344372266@>@:5$$%

While from SRA it is:

@ERR5029616.1 1 length=1955
ACATTCCGTCGCGCCTAGACTGTTTCGCGAACGATAACATTCTTTTCCAACCAAATAGAAATAATGGAAAGGTCTGTTGGCGTAATTCCCGCTCACGCGACCGGCTTTCCCAGTGAATCGGGGCGGGCGACCGCCAATTTCTGCCGTGTTTCCGCCGCAAGCCCCAATTTATCGGTCGTAGTCGATGTCCTCGATTGGGCGGGCATGGCCGCGAGCCAGCTTTCATTCTACAATGGTTGAAGGACGCAGAACGCAGCCTTCATACTTCAAATCGGTTCGTCAGCTCCCCACAACTCTGACGCCCTTCTCGAATGTAATGTGGCAGTGCGTCCAACTGAAAATCGGCATTCTCAAGCGCGGCGACTGGTTACCCTGGAACCTCAATTCTTGACCGGTCTCAAAAGGGTCTCAAGGAGCGCACATTTTGTCGCGGAACACCGCATTGTCTAACTGAGTCGATCAGTCCAATTTCCGAATCTTTTTGGGTCAGCCTTTGATCCGCGTTGTCCTGGCGGAGAGAAGAGAAGCTTCGTTCCTCGGCGCGGCTCGTAAACATTCTATACGGTTCTGCGTCTGTGCCGTTACTGGTCAGATCGTCGATCATCCACCCCAATATGCTTCGCGGCGGGAGATAACGAGCGGTCGGCATTGGCGCACTTTAAGGCCGCATTTGTGCCTGTAATAAGTCCTTGCGCAGCAGCTTCTTCCATACCCAGAAGTGCCGTTTGCCCGGCCAGCAAAATGAGGCCCGCGACGAGTTTGGTTTCAACGTCAGGTGAAGTTGAGTCGGTGAAATAATCGTATTCGACCGCCCGGCCGCATGAGTTCGGCTCGCTCAAGGCCTGGATGGATCGGATCAATTCTATCTGGACGTCGCCACAACGGGCTCGTCGAGAGCGTTGACATATAGAATTCTCCGGTATGTCTTTCCTCTGGCTCGAAACACCCGATACGAAGGTTTTATCTATGAATTTGACTATCTTGTCCTCGATGGATGGACAATGTCGCGGGCCTGTTCCTGGATCACCGGAATATGATGAAGTCGTGGAGGTTTCTGCGGTTATTTCATGGGTTTGCGCCCTGTCAGTCCAGCAGCGCAAACCCGCCGAAATAACCAGCTCCATCGATCACTCCTGTCCTGGCGAGGATCAGGAACAGGCCCATGTGTCCGTCCGTCGAAGAACTAGATAGTCAGAATTCATCTAGTCATCTTCGCATCAATGTTCTCAGCCAGAAGGCATACCGAATTCTATGTCAACGGGCTTCCTCGACTGCCATGATAAATTTGATCCATCCGGCGCTGACGAGCCGAGCTCATGCGGCCGGTGCGGTCGAGCCCGGTCATTTCCCCACCGACTCAACTTTCTGGCGGTGAAGCAAGCTCGTCATGGCCTCACGGCCGAAGCACTTCAAGGTGTGAAAGCTGCTGCGCAGGGCTTATTGCTGGCACAAATGCGGCAGCAAAAACAACGTCGCCGACCGCTCGTTATCGCCTGCAAAGCGATCGACGTCTCGACGGCAGACAACCGTAAGTGTTTGCCGGCCGCGCGAGCGGCTTCCCGCCAGAACAACACTTGAACTGACAAAGAGATTCGAAAGTGAACTTCGATCGACCAGTGCGGTGTTCCGCGACCGTGCGCCCTTGAACCTGAGAACCTCGGTCGAAGTGGTCCAGGGCAAATCGTCGCCGCGCTTTCTGGTGGCCGGTTGAACGCACTGCCACATTGTTTTCGAGCTGGCGTCGCGGGTTGTGAGGTGGTCGAAAACCGATTTGAGTATGGCTTATGTTCGCCGCATGTGGCTGGCTATGCCCGCGAATTCCGAGCGGTATCGATACGAACTGGTGAGGTTGCGGCAGGGAGAGCACGGCAGGGCAGCGGTCGCCATTCCATCGAAGCCAGTCATATGGCCAGACCTTTCCGTATTTCTATTTGGCGAGGTTGATAATTCTGAGC
+ERR5029616.1 1 length=1955
;6114466@A>%@D?4/-2192::863**294217*'$%&'+//('$/+,-6+-948>@B>878:=@;;?><<:B;B@=:211.005>HGAH;B4495?<@?FB5557.5,%),*2:3=++92-3;D>=ECC@A;79DF?7=?DDB<>=?@:A9FB61/21C@A?)--0'&($%'-'$$&)76...,$$'%/%%(+,,7:5/50/0/23+1+))/42,,:;;;5(((&$"$$$(*-10*&&&')''((+%$%')))++765)''$$&&&'-'.+/30+#%,/.<A<?=903;;95+*+(.+,,&%&'$)),4779:;?A?A3---33;7((4...;9=:C:::0.-389;:>&.%)(''''-,$$%&75,042147422504-1./+(*55+0*'(//636333*>?>472<6-&6;?<9@=AA@'*&1((%134-0.+$$$%$'*605+%5).-2,#%./+/((1+-58?-?79*$656@?<552,69:99A@<@<7994;:A==((4615/14/')%))%&#&%&$%&+5D@-3)-&',-/.6>?B9:;;64492:.-,-&%5?=8$$/*9981=79B;FD;DE@=7G@B4:AB<<E>9?;$$0)(&&&$$.8=;GJIF@=DF011C53?8;9))(876795.0**//.*('$&')*+48@BDFACEEDG3<8)))))++;;>;:<>A>B42).')$%%-2./1%,,79:6+-%*3-?/BE;FA?%$$$&,/.)+-::<9$$+62012''550/.6<?=><@E1;68A0++217<:3002A51?5<39'''*C?A=6921210/8678*(*0>?GD;85**'/?:3).7,/')57-9;;::==17--+&-/++*$$+,+2843429238&')4$##%&#%$$%'%''%+0)%%%+(&&')#$#$&$($'%&'&)%67<<>:9<::<9/53;?::;1.*62-,&%$$&%$$(,33/2:?@63/<-.-,+*,./,,(*329A883387744:4<C554.$12%%,2(($'<==E;8@<>D>97<>9:?BC77-.+-,)(*+'0(&-++'69;;4+(($%&%4*-/7;4579>==<+%(&$#%$#%%&$$4&(&&0(('$&$.1#''%)*,)'$$&($'$(07+.2.)))'&+%)+$$$%##'%%$+%'%,'/01333/9DC>##'/+++++./0$/0/(&%%%$$')'&(..1.'%&%&''%#$$*./$$&&&-0-*')$#$$%**$,)*1,1%%'($)(&&,1.&-$1=255301%)%%&1**,%*+-(&%#%&(%"$&&'())*%#'133.$$/(,)(&$,$)-/431%$*,*1),**'*+,((##$$')+($&'')%'%#&&(,78;4'./731.0$'%%')*()&$-&+(''')-/28/-121&$%&&%&$#&(+*'%&%($$$$'%"$%'%%+.2-(%213335-)))6&'778++,)'/(-(&('(38:33,&%$&--**%#$'#$'$&,,+%-.($&%$#$('(*)&#%&%(&$&$$,/7-011/)&'&'$$#$'%+))''$#$$$$%,--$%&''')(%%$%&$$%(**+,1447038++,-*'&#$%,,5327*/01.&'+(11330),*2.'/'')+((1,2$$+%)*&'&--+/0,/'+,+%$#*,1430*$$(%&&*%&%5*+2210%%$'%%&%$$'''2),&(%*+10.&22260/*.$$&)())'8/6.1'.,21)3&'1399400)%%$&(+,114;-2)/21+)1)*&&%%$%'%(%'$&'&(%(*(*-2-5',--,).87890'')$'*&('$#%$&&&(*),)+)'$&''(/34*'%&&')/1+%+''+)06/)%'-(($#$'$$(&$.511<760%%('%($&$(,;=1((2+)()2--0.2'&-++.45-*+%&%#&)&$((#%&($%''''200247/1$%'%%)&$$&(*+%#$$$**0''&).'//,('%#

Best regards, Anupam

durbrow commented 6 months ago

Your question is about SRA data, not the tools. Please send your question to sra@ncbi.nlm.nih.gov