morispi / HG-CoLoR

Hybrid method based on a variable-order de bruijn Graph for the error Correction of Long Reads
GNU Affero General Public License v3.0
9 stars 2 forks source link

Fail due to erroneous string in alignment file? #3

Closed RaverJay closed 6 years ago

RaverJay commented 6 years ago

Hey, trying to correct some nanopore reads with illumina reads of the same sample.

Script died with a Python error due to a too long file name:

./HG-CoLoR -j 40 --maxorder 50 --longreads ../../../indelcorr/corona_only_filtlong1.fastq --shortreads ../../../nanocorr/corona_illumina.fastq --out ../corona_only_filtlong1.fa --tmpdir tmp
[Wed Feb 21 10:47:41 CET 2018] Correcting the short reads
[Wed Feb 21 10:49:48 CET 2018] Removing short reads containing weak K-mers
[Wed Feb 21 10:58:38 CET 2018] Building the graph
[Wed Feb 21 11:01:33 CET 2018] Preparing the raw long reads temporary files
[Wed Feb 21 11:01:34 CET 2018] Aligning the short reads on the long reads
[Wed Feb 21 11:29:56 CET 2018] Preparing the alignments temporary files
Traceback (most recent call last):
  File "/mnt/mahlzeitlocal/sebastian/quasispecies/hybridcorr/hg-color/HG-CoLoR/bin/filterOutShortAlignments.py", line 17, in <module>
    out = open(sys.argv[3] + curFile, "w")
OSError: [Errno 36] File name too long: 'tmp/HGC_19623/Alignments/:+04.7081GbicYQQIB+/)4/bJJ+KOTUE$-2:*D5@:E,=+".484FACA;5AKKUR0AA705;$#/*-\'\'UPGLHA.#(%*+/A]SaH:"T"8"2E`ZJ5744**&,\'\'32AY>+ED<?D?>9C8QG:FA;/3..005BU=RK6>?.8NNH>KEGKJ5+3;4.-4,(,",;E46,K:C:78C30959N>63@99?*BHK@J)VB7"*)(=EG8UF:A0?*,17121217?3@Wh^47`VC/(69M^M76OE&%.\'7.\'"%\'-;Mj^]BMQNF;E915@2.1=\'AJE8*9-5DA()/*05(/$:;08,$"2"+710;.5++5F36MGFH96>975:[2$=V$>%(8G4hbeEG-3*(=.,(8BE5,094/.;&.=8-0343(*5L="%$,36:MM3.""1)*Q6EEID4B<25,FB.H\'>:B7""$"#"$,%#%*&%"%"")/%.,.%43-$-22:N@25/*()$-+LV[C3.2,6(/\'6247+,:,-0>DIKB<<2*,"C@7$;0-/>-,),F%5+,0<;O5D4H7E(%/1;4)3".4FQJFOCCB19C=?HKIYG<%*".42,.7)23>5DMF;0A5+>7^a9N8+6%*\'$327M@".2=EBYOIU5A=FaP/$%0DB<:.SNDPXC*?>>:C%.-<2$9D@D/15-(""//I9OVU9.\'MTOKQJJ>.(#+(47EG@3DM42%.0B<Y<@4V:,><:3"16)61;3F?eK>(2;.).+9>KGRHB5Ob:1-*+,#("$&-G=:0JG4&"""*#%85,(6.*)#269-\'-01<K9.F8?M1+&""%\')"7)+5Q+>"3>:.@AJ95B915/;476SN]5"""$"#"0+7\'%*;9O*9Y:2RcD\'E=:B+0-+.(""%$-,0\'%#98JN2FK1BN<UYRH/)3%,>HY4,;M689;22?MV(_YT#9)/<&0<KLB2=0@=.0&/1O\\a96-31:@-,T8BHDMEM;7@_=M9C751-HLQ@;3",&\'"(0#"##\'"""#+304#5KJE=97/85=&-$"*<(9<(:65D6?G]B+"+-*7"")-$0A".8;829/"3>.=-0,=8-$+"/),8494ACAHE<(*GRh]47-8)D8<76<\\HURH-),#4DCL,01368;9.2433TU[6:1.6?7))6;"-/0-&))>\'#9EfLJTD,0..B17,0$?:AR>;:CNE=IMFBM9NMC26-0""-$*\'.NB[S;A1=38+>//E<UO/2":CH4F3BE5DCI15Q:T<;2$"63/(&%($%/>0B<)\'+(3d22b0L>+21$%%305).#".:*(;"""4"C63),60+))-=7YjVacZ3J\'9=;"B+MK-)#/1F:8&*.EEVDbfST6ATLG$YCJH763/-/C8-.<EPXFF\'%%#(9[[HYS-@;NFBO7S=12fGI,\'\'&%"41)+"*@,E-")//#;2*<:44"&94$(-(=)2(*1<8\'-3(/>54&+#$$5<*,";,<;A8a;MFNI87L./$1ND"+=MAHAI^D3,K/3CUHH<CE8B1LJ?D@B=F919E3<U>XU@:>%.H=93267/3>2%\'-+BCE2+#2$+\\725188;4WH5@DJDK(6;=SD;EGCH\'1H=QJA?2#6:6TSXGa@;\'&$$134014*$.$7((838^]b>+4<K>RFAL?:./,-D""2"(\'410/.:8DE485329:4#)/&"#)15II8QA\'.522.-"+H/5@61SLNJB/++#).\'"73/;HBLGSYG9780I=GFKI57.]OY^Q:+/4""-1##+850:"4.ME7AS>=6*A2;#%,*:G;FLEB8VBA<,3+=.\'4*.048=INNUJLJH7QD3J/0=(?NALEHUKSM(G;QNJ-\'"$*.92<?$26:2@:/=7F;)2;5?BA<,+YB9(%0"(%(,%/=O?@+B"",-2R/$7987G-"#:04$-#"#"%"91L9K=E8"0HLCM;AC`UL)%<%)**(F01368D;"#.84138""$KYE,-*+5>%&%/D0%3=44G:S[^OH,18@11-71+,/FOJSJLU7GQQ=IK[OT7<QTRMR327.IGcR[bA5+4%"#8("56;+44*#*<,0@HNOJb/"/&%(%\'"%*C6@96;,QG68?;<?ODN`44,&-2*-,4./3<B,C296-5;8;<0#\'(:@BJH5869-23++#(/>38969\'H@(]TU\\;SB,;=<@FKIG4IGF)*-&*F-D.(+-))%%0-267#)?&)/*A:;JP7$+3<8E=M=EBFD<<<48S86470)(,,J?\\gO>51\'/:**.0*7TANHE8YNVX84@E1MOE.B0,&1+-\'+$"*+&.,?*-).(+%##"-67.5*(*)0G;+51#2.152LE0.S@8-6>Y986-70/1-/A,JDE&G@A<A50.\'\'7\'0;B07"4(<9+@=7DN:-(C;A%/-53THIV<CEZK#$A7:1627FD673OA574JFRg_]^VUITESJE:3GC=&<6+))F=?FB4>BH903K-.0S-+/?@N"-7%(/?C4+"*")"0++28:;F<Y[LL=BBJQ6/)%.+"$-9\'"&;AHI0,)"#9?]@A#848NdJ=/(2&1*00L?IFB6><<""0*.(&)?HAY8599+\'"\'"/*$2$\'2(<\'?%-1-*,%+5D:SPIDINGK%(+9O29>HBfX10"-$--OJ6]E/$%$\'#2BF9D<804?1.&%"#-@*6(.EEHE@:0WD;=GdLZBG5G=40H41@3>++,+8+B9JVUB7,>C4?73/)$,5FI8FD^I\\34\'\'.14:@;,36AA2570,3)*=E59?F(EK=YSO(P5/A_[NK/#.04:&%/_:>MZ-&;53\'\'\',@A=BGJ<1(7G1AOJIQHC)FMQR6>,Q52OVHS="%""0*/-?1)&+-$\',B+@578+6$%+,1M>-$\',0BL6-12/><=:""/O98GKB@9"""2,902/1,07\'$/>:120>SLBMT+,8-5,#"**.?B*/22,+HK26hSF2+,-)+$&,2508"-7/HA:I7EBE&=<\'%%)"<.;?-OI8@?0%\',C+03"95:$#92220123/-,,;CCEDE2*(&+4(/-5DB2*2:DUTI*H:H8J1%"""$+XQU>:K542B3"(>&+/5.9HQ5>"#270:/:+48;@K>.66?/@8D9DS@5-M>@^aQ[\\aMK+-".+7(&"&\'*""".,09,*.OHJEK)"$""&%"&%012L343KF3<><B^B@G3?-)6<<AV6ii[4PLUJ-95<2.-06B>:3/.3"#(""*%"#)7+,\'**)3:7665;?@F$%%&,,D&-DT6640&67,""#/+$,:PMNJXHMQIN9)00<\'0HbPRW<BCGDP?/5;/75.\'$/1\'\'$(7+\'Pb@:5"(0223#&()))?;9;#=01:aF.0HGBPWQV2.HHJ[]=a\\bULYd8PSJYcH;:MXJY*JB58AI41P=TCNI?39<=37*0@A<&,48.04ID=/%C==;DC@?E:69/Cd6:X6C(,2>OPB8\':9-\'")2*;3+*1))*$"&0"6&;:>500*((+"%&)6@=MFKBO1\'*78&0\'*5?I?385\'(7.(4>&/9.0>8B46?@\'LSV04.+&""#,,+"C@P5.45>QGELJB8JLEKEIPM--2449F01437-24T3?+&""\'(@<1"++28,#,75;(-06K1e8$$H.9<H6)804>8@>5=,*)\'"%#)7<EAG505##,+)C>G32C/&:/\'158CY;J\'$$.*)2.47G&\'%"+%@*>**D.(2B=7QDZC2(-3"499>/K*""$8MD+-B/1%#&*)9%3"1"""#%%)CJMXC4<M$;<;249LU]ODRC0E@-..34%$.A,,-+<`$\'.S8RI3$&).""##)CGQKKQcG<6HF.;.%3&NTJ+\'<$$2&).%I_MR]2P,"&&)#)%"./<F0,&8%<1G=96/<;CI[Tf]9:>7I/#9"325\'$+\'//2-",/5=.BOLPVQN\\UPTdNK=OI0R2..5%"*6/(158;;9-7FI<DWZSV]Y^bT5aQW:<??71]VCBDKIE;CP6%+\'#/.-:67:,=?5H@;(+"",7<@TVF:7$2LAO,4D<,8""\'58B=.8,)*4*LL"(""$++/0;2**)V\\GMQ2L395=@.Nd=Z83&$\'R-00%&3MlPYTT?C@A6DZ@9"<6,/"*I18)$78-&-\'\'#)"\'\'"<:)1D;;@DQ^WY2:<B0KRC,*53(+"#-)%A,$5J==997AE.2J#.;Q,D6<I-,#"82064WD447*(%;1383\'.4,0\'64D6*#.@290*/08($9&""+(%..*-$1=:;FB"0(")("$=/\\XQ?V9aL70-+"-2;46F56;=CGN&-4;I*0>#"D2,+4/(6;:*"%,3>@Y=?@6;<XK?@("(""*;H>5d`\\Y_QGH.=#\'265=..)3&3*";9QGONM.23&)00F#,21/\'(&/*"#"52L;N7<-055Y>;;9\'%7PO_;,&"%7DKJFPYGK@L=KhdXB(+\'#(/*/-7JdO9<N8bhU97/A@:7376)3/91%..\'%)%)**)7<9C-./6:=G4\'.&+`BCIILMN=CJF</BO>PGJMAK\\976.4.\',4-KA?S,3\'\':94-,\'.BEE;6&#$);>*,3\'):4?1#"(6ML<6>FENQJS;6NA3f;:7%IFG@P>3.aJ^S_eU:E>REK;.?;91:\'09X=WWLO@(,3&"1W.("\'"7*/+@SD5-\')>YMUYb_A?RV?-+%0.0506Q66//1;)979+*6,+-8.3/2$3U:614//7.-#%"#",6P9C5B**)4++21\')*;//<GHjA?V8QP@`:88I<7>RB]C/-((+&>7VDJ:00,>A8@@D5;()\'"""#%##+"$&&(>7+&-*-("*%)146>CE1C+F12940<161-2"$*0168GS?Z]IM4"32()13.62L78.+(1$463\'GKh^.0")5K%\'"%+$&""2\'..453#$07045*$"$""#,+29??59=(&2($0"#1>11*\'$\'-\'8=8?H<?A9L5+O8,&\'"$/))855-)(%6F0)*-$/.%1#=6G=H>/%&"")NM,/03\')0H\'?,)&$4&=@IPF+>+&,"""0&529%"%""""""%$$77NCM&600-.+(555"\'13&*6KL@M^A/)D9+%EdXB?EJ72/09J@F/;019/.&*-"#%#<E.;3.5D=<>JRRG:#3#""26A<32"&4<.%*BE@J,/+.2*(*<<V:5/980>ULQ-8?/:I;C7\'/HA?^^jG;4YX44::^D=&)-\'6TIVM97)*=*"=IL3O=5:8DNZ;837AA7:D5?3&0*@$"($(8<aMB)),7**<BN;2#.&\'""\'19GN:=GGI/K<:&$(.;HA8B)-7\'>0K(\'=209::()4;$\'(317Q?BALDQiXSMKMA=5E+7?2;7TcQT;BC2,.0*393(>=HD@BgL?-*5\'&%#"*(&2#8""""".*.F""RH;6R:A<:D<,\'.,+1B%2A+5-1--EI44C08DO+J5\'.2*Q%*;fT314F@J`_L@&YDCN/\'"""""$28QKNN@RUOU]N@D)<K3L/N[[_TRI?4QJYMRGH11<MWNYP/1""1;1=743A43"/1<,.9>4?H_NI+,AJM_I=(13\'$\'.7"$+\'0G3*>MU41@)3?78I+87.06;@N21346VcHcSI^VbeYf\\c1*(,\\510(*0922JLAGRE#GI4&E3$%%$".$\'/"\'+&@2/DH7@>7/D<ER6@*4>+1=*#"""70^67TD>(,;B?B/*("*38,-$)&))&&9H*,2O12145WA-??7D8+5$\'@=3K7?941=*1"?/Z95522DGA=81:D5E0L:ZT8a-,=%,->F@W]A8;(\'*\'+50KOB=RS9&,@1MLBAFDC_:-7U6+#(3FL39,"(A<59M9A0#F07*PhC+8*840/95;><#)3.(C/A8/"$)*,*4@I4_XL:8@beK:GS@I__A@:9;<F7/.("%)\'-.7;0GHFC@hhH-=AC;3(.-/RYJOBLMZTE=+$"(3DOS>63#*1019N()/6)>=P;LP6RNI25+/(&,*+<*-<58%B6AD1;R8./&\'&EI^ID99@UDIK;.,*&.2*P;3\'-%XQ;1;.657G5),&\'&6((-*/-.,)12047MF;9$"\'"#)8ZYbEcOO`H_A1#-"%"+1=9?V?JI:WC/16#""=3-)$#1-.1886,0-9."9/.\'4-&2"%&(),=8C740)*)&""8+\')120\'*("1"%O9G:0"""72=.G[=5850*)"\'%0/3678NKSB@PHB=.6>5"84)"=USah]SJAB31#\'#0<cK/.-"#VFC4,309R3-?@"5(,E*B0<4/C7+)-5&&:0210%\'.AH-5a^?:9<"$07R3Zk8253%"""0R4c=\\\\MB<@>SL,7P21969:")\'3?-[LZ:Q@VZ,=/)<D+,JA0.6.;8>2FF@.E89?*=G63"1@AF3WNHH*/4B?+8B4L%-.%$*,./LKXDBMSMPJ3>"=<?8.$8334:A,(\'"8/.2>+&,;DD25;7=>HED>9;650$)*"?07#H=>?NF7:GE<MOLQPG94Q88W`<LIB+5*"&/\'8<;H=704EL::#3>E=2H<9)-(+/&.1/]>C?NA(,@QQ+3+,OST6]U6I/XB916%:3JR8UPR*9,B3C9)226-&/\'(/#/PGX[O?70DBOK8&*4$32;24B=7:<E=6956/9:624;I;;>A22-SJO@G9BQUI[FVF=+9"",%)"3<,-.B,;@CTKL-"&-0/<=5<7*C#&&)""$"+"6214<3&.7/8-00""&(,#"+-2(H3&$*;@?&#((&%+&%0*-$+""""#%,(""""""'

Going through filterOutShortAlignments.py shows this is from the alignment file:

line = f.readline()
if line != "":
    t = line.split("\t");
    curFile = t[2]
    out = open(sys.argv[3] + curFile, "w")

Any idea how that comically long string might have ended up there?

Cheers

RaverJay commented 6 years ago

Nevermind, it was a quality string from my longreads fastq file, which I guess is not supported as longreads input. Trying again with fasta format.