vpc-ccg / calib

Calib clusters barcode tagged paired-end reads based on their barcode and sequence similarity.
MIT License
38 stars 9 forks source link

calib and use with conda #28

Closed ChadFibke closed 4 years ago

ChadFibke commented 4 years ago

Hey @baraaorabi,

I was wondering how the consensus and error correction steps are performed with the conda installed version of calib?

I was able to generate the test.cluster with the following command:

calib --input-forward R1.fastq.gz --input-reverse R2.fastq.gz --barcode-length 4 --output-prefix test. --minimizer-count 7 --kmer-size 8 --error-tolerance 1 --minimizer-threshold 2

BUT, I'm unable to proceed with the clustering and error correction steps because there are no additional calib arguments with the conda installed version:

$ calib --help Combined barcode lengths must be a positive integer and each mate barcode length must be non-negative! Note if both mates have the same barcode length you can use -l/--barcode-length parameter instead. Calib: Clustering without alignment using LSH and MinHashing of barcoded reads Usage: calib [--PARAMETER VALUE] Example: calib -f R1.fastq -r R2.fastq -o my_out. -e 1 -l 8 -m 5 -t 2 -k 4 --silent Calib's paramters arguments: -f --input-forward (type: string; REQUIRED paramter) -r --input-reverse (type: string; REQUIRED paramter) -o --output-prefix (type: string; REQUIRED paramter) -s --silent (type: no value; default: unset) -q --no-sort (type: no value; default: unset) -g --gzip-input (type: no value; default: unset) -l --barcode-length (type: int; REQUIRED paramter unless -l1 and -l2 are provided) -l1 --barcode-length-1 (type: int; REQUIRED paramter unless -l is provided) -l2 --barcode-length-2 (type: int; REQUIRED paramter unless -l is provided) -p --ignored-sequence-prefix-length (type: int; default: 0) -m --minimizer-count (type: int; default: Depends on observed read length;) -k --kmer-size (type: int; default: Depends on observed read length;) -e --error-tolerance (type: int; default: Depends on observed read length;) -t --minimizer-threshold (type: int; default: Depends on observed read length;) -c --threads (type: int; default: 1) -h --help

Am I missing something here?

Best, Chad

baraaorabi commented 4 years ago

Hello Chad,

Sorry for the (very) late response.

So, the conda version does not install the error correction module and only contains the clustering one because of some issues I had with adding SPOA dependency on bioconda. Please let me know if using conda is a must for your tests and I can give it another try over the weekend especially that SPOA conda version has been updated recently

ChadFibke commented 4 years ago

Hey Baraa,

No problem! Conda would be preferable, and I'm sure many more would appreciate it! However, if that is not the case I'm happy to install it using your instructions on the readme. I eventually installed calib following the readme and was able to configure calib and calib_cons. However, after running : calib --input-forward test_R1.fastq.gz --input-reverse test_R2.fastq.gz --barcode-length 4 --output-prefix test --minimizer-count 7 --kmer-size 8 --error-tolerance 1 --minimizer-threshold 2

Extracting minimizers and barcodes... Memory before reading FASTQ: 1MB Memory right after reading FASTQ: 399MB Memory after reserving for read_to_node_vector & node_to_minimizers: 399MB Memory after filling barcode_to_node_map: 597MB Memory after releasing node_to_read_map: 584MB Memory after reserving barcode_to_nodes_vector: 584MB Memory after filling barcodes & barcode_to_nodes_vector: 676MB Memory after releasing barcode_to_node_map: 663MB Read count: 1795912 Node count: 1795912 Barcode count: 1737251 Memory after exiting extract_barcodes_and_minimizers(): 663MB Clustering... Adding edges due to barcode barcode similarity Number of masks is 8 01111111 is assigned to thread 0 Thread 0 built LSH in: 0 Thread 0 processed LSH in: 0 10111111 is assigned to thread 0 Thread 0 built LSH in: 0 Thread 0 processed LSH in: 0 11011111 is assigned to thread 0 Thread 0 built LSH in: 0 Thread 0 processed LSH in: 0 11101111 is assigned to thread 0 Thread 0 built LSH in: 0 Thread 0 processed LSH in: 0 11110111 is assigned to thread 0 Thread 0 built LSH in: 0 Thread 0 processed LSH in: 0 11111011 is assigned to thread 0 Thread 0 built LSH in: 0 Thread 0 processed LSH in: 0 11111101 is assigned to thread 0 Thread 0 built LSH in: 0 Thread 0 processed LSH in: 0 11111110 is assigned to thread 0 Thread 0 built LSH in: 0 Thread 0 processed LSH in: 0 On thread 0 building all LSH took: 0 On thread 0 processing all LSH took: 0 On thread 0 merging local graph with global graph On thread 0 merging took 0 Building the graph on 1 thread(s) took 0 Adding edges between nodes of identical barcodes with thread 0 Adding edges due to barcodes similarity took: 213 Memory after adding edges: 433MB Extracting clusters Extracting clusters took: 0 Memory extracting clusters: 433MB Memory after releasing graph: 392MB Outputting clusters min_records_per_tmp_file 180224 There are 1795912 clusters There are 10 temp files There are 10 temp files Processing file testtemp_0 Processing file testtemp_1 Processing file testtemp_2 Processing file testtemp_3 Processing file testtemp_4 Processing file testtemp_5 Processing file testtemp_6 Processing file testtemp_7 Processing file testtemp_8 Processing file testtemp_9 Outputting clusters took: 24 All done! Have a good day!

I received binary testcluster file, which was ~2.7G and started with a 863640 863640 17 code at the beginning of the file instead of the expected tsv file. I then ran:

calib_cons -t 8 -c testcluster -q calib_1.fastq calib_2.fastq -o 1.out 2.out

Reading cluster file: testcluster Reading fastq file: calib_1.fastq Writing output files: 1.out Reading fastq file: calib_2.fastq Writing output files: 2.out

which results in empty msa and fastq files (I think resulting from the improper cluster file). Please let me know if you would like me to transfer this additional issue to a separate issue!

baraaorabi commented 4 years ago

The .cluster file outputted is a regular tab delimited file. Can you show me the result of head testcluster command? calib_cons generates consensus only for clusters of sizes >= --min-reads-per-cluster

Also, what is the length of each read mate?

ChadFibke commented 4 years ago

head testcluster

863640 863640 17 ??6?ޠp?lÂóLN????T???@a?aJ,?}?????z?,?f;?>v?m;ǻ%p?ɔ????@???kv?8B ?!?ZWR:iZ???DDS-?e?????b?K??V????.&+??}?kc?^5?H?$??!mHm$ة$???4;?EX?X?? ?4?$B?ё?J?P 46?[????6l?η0???6? ??3r̆W5?????.4Dz?O?ĺ\?৺?f_??A%s???@???°?oT?ol?/؆ ???(DK?b??GBL-???(!?/?8?.95?8??~???_??h-n???熜?.?Vt7,\???Դ?????-?١?%?d3w?AdR???z???†?3?(?eb??Z:?1?T??M? &?????~??@?($}??8"s??Q)?L??u???\?ilL???G????&Z??P!?g?8o_ʔlz?U?O???.?j?leS?U? ???+ke Ɂ??Qr?l?^?????TMm???4v?)?o?S\,SR?L^?l??=???????庬nWv8????? 遥??+?l??Q?? ??@^?9 ??G.9e???,?#???/??.???0?X???$` n???BAVEr(>$G}$?@???X[???>?uD????n?;IR?u??z??{?Ĭ"8?!?X?n?+\ꁹ?L?'- B???-?p??^??@?$????^r??,Q?9Rtjk;?[Aў?gh.1i??\E!?'???s??y????_h&?fMk???8/?&?{?v????? ??g?3 ??4'?L?'f??V?8a??[????a[?:(?G??TQ"l?mL??g-4??f?????^??l]HB?c?c7???E3O?b?? ??x~?J?i.??5_??^'?3o?z??a? ??;Ѭ?x>!?⢂?????-l?U?p_\?????N??$G?+P????nk? ?~?@>Lހ?䖃??B0T()?4?ղ؄?'?Q?7IH;?Saɏ???EgA??>??ˌ? q?? $R?? #Q?ƫ.7??qY?????G?|?E??>;7???v???? ?( ?D~,???fQ????????z3??]?????b??B?obtl?y?$6?c??~??H?????R?rUA?><D?47?y?!???3?>X????C?D?)V??;h??s?&QKC?$>z??$/?3m???@]?T?ġ?wM?߆X?!k@ݖ?]r??.???z?˴??~?+??<?3???qӰ??2?l?<n??? u??g@??.?????|? ??:? ???/???9?K??h?:y?c"eѻ?4h?1??2?Z?%???ZTN3?}K??&?C??4p!B?j6&???3??EtԈ????!?N?Q ?&x?㊥ ??ȑn??s??v'?1?7,????vχi̱if?5/?r&?K???^t$Na???ay?ƔD"?4 ????? 9iaFv-?À???ʲ?-I:?rF2Z????K?|?So??rӇa?忦iiÑN?T?)??yO̷s?ti?8??i?uR??\f?????q?/?^????????[???Ru?˗l?V8͢G?&??s?R&!iu??8 šdҷ]?? ܐ$`?f??ˌd<?j??(݃?pB??NAԘ܎?Ŕ1'?D?e??l???U?˔?ӥ<}0l????ÇC۽8??yS???Cì?h???x??6s??-??fJ?(@n1?k0~;?%?㽲??oC??%?1?Q??0??E?y??+$??1?4L??H???X:?? ??C?C? ??? ?2??4?XjY~σ?R5??tQ:^?`?????֣?S???@ɥ?p?TAe?t?~"???S!?:ځ?????8?K?>?uF???q?I???,?MG ?³?hqd???<??q?????_&?X?a,-?i?1h?T\ǎG=???s&? R\??A T?? 4?7> ?1?y????r??9????1?5#????uQ??|????W~?|??K????>? KLci?m3'?l?@u???"X?@4???s?ب??????=ٮi??v?>?S?q??J?????g@T???m ެ?z(??Z.;?? 8ճZ2F?H????n?]OG ??lN?|U??<?9.?????E?K??:?F7k?????? K9XŤ?z??=??4??? ? pxV?v?a?a?^%f?|?n????}(S?v? 20?zA??鉂?q?C?#f?% ?,>??-9??*???????m/?B????|1?.j??h6??H??@????N?p\??????????c?^?J??Zo$??y 650030 650030 41 ?le*???????[H???u5???Ӫ?0??~n?a?bR?@NF'????r??H??LB???????????DW? ?????m??J???װ?gyX4<?Hr}??@qIM?Zo˴DO;AHqI???о?w?,?wB?"R1cQ_.???0??i?H??흞 ???g%v??<?K?d?|nN??4?DS۹5j3??Àȩo??e??\??R???i?ݱ??Nr???&???ݱr??????m2??bM??xI?t'H|E{<b.? ?qȅf??????)??1N;??%??_qM?^??}???????NY ٰ??????w^Q?????? ?z??Vu}?~?,qB?V??S??B?h????R?s?????ǻ??6??W-???8)???2w?惭\ %?Vx?78??1?b?|{g??@????K?????0?б,L??~"3[gM??d?od g??8%=?Q ?"?l???ċ??0`+dp~?C D?,U?7???#E%??@O??7܅?\???F?N?l?;?4Co}?I?kT"???nT??6????Q(k~ņ??k???>W???+;?????v?Ä~p?!??0?B? %??v???ぬ?"Z???b?\?3(@?VPfr=??G~Y??Zu?2?u3?? ?6?J??{E+6|]????+ ?5PR?? R? u?~?V???????v?fN?Mƿj?.?S??&0m? ?(???x?%?????M?"???!?#???I??T???/?ȭ?y?<????7???M?wQd$Z}۷?u{?h?h?b$f?W\?l$k\k?0??O????O?t???H????E?8 J?G蹒?B"?5????:???!???K?c?"??i'?S???F??(??TD??f??w?b??#??&?n׏?7?xڍ??v?F??~?:??ŷT4?G?$?z??!I?mBF,????\???:??[$? ??i?n???7?k???Q|???.,\??+?H? ?7?F??4??U?0T?? ???????w???????g???????h??QV?????[e Z???Q?DW??B dn? ???t[ִ??jmϾi?0?X???[e?P??!??-R???x??uK$? 㣌")-1??y,??&?% =?[?s?58dĤ"hĎ?-2/O??\R??????s?x?Q?'{??1%??p?G??FX?J????J???ZFѷ??H?v??ơ.?6u???}??~?bQG??c`q?J???nA(l???͍?6\񑄈u?kU?9 ?Ø?j?1L??ܙ !?qŚaӗ?=?i港)YOc?N?S?. ??2?k?e?5>m?ϯf??Dv??Q??,nwRsΝ?uڟ&?WPr?uo?t?xeM??A)w?2?Ey?BDZOR?8??h>G&B0u?zP????Tz??8w#-?????-??t??????JOWI?Sr⚆?c??0Iw?s? /Y???2?ϕx??b?J?_@?׺?c?p??X?=??hP????~????????o֮y?{?L?~+З?:&????E?^??*H??,גpy?W??#???F???? ?H3???ZY?jb!b?VV??R?ī?~??9??vn?8?ڶ??iZ??ޟY?=1??;??=ƻ?Cs8?????;??T"tma? ????B?f!?????5?DV??@3XDd?4qE?=??j/??o??ܜ??Ҝ??t:???x?9ޗb?؞N1N?h@ej?;?? _r?)???»&+???q???\?%?cQIA?93x䦈? j??N?ɪ?K?}?ͥ?|???B['??H?U>????9_Ί6"?Hg?h??UZ???4??|L?+?ՀՄ??????LĈ"?c?3?!0ye6,?B?]^?c??D= ?M???8?v?i???V!??Ga?5,3&@{b?>????C?Ȯ??/????0???^KZ?1F??L?v?]?,=??????G? ?T?m?J????a1?D'h?y,w??o1?]?Sw???K8?v??]??J?????O??? D??????d?Y???k?kn?1?xg?^?t~=?<????????>?VO????K????#?LW ?8??3Is8??ߔ?x>??J??/?????I?kW??^z??!3???%?vHmʏ??<?fH0??m 953250 953250 47 ?9???W?X?6??H,TC0^?g??????ti?W????UP???Ŭ(/??m<̳??G ??-??T:?B?V ???u?X???y??&h??[?~??&????s?z??x?<W?}?G?q}>%3?&???y&~ 0??9xH2/?????ux????,6? õ?~?Ƌ?z?>>?$?-??1????? f<?K???5ظ ???S=:2??% ??'ɇL?V???IIϺ?s??R^??q?g=F?GM?L??9?? ??4?o=7????~???Mv????|?Mo]mJ~}h?????J??p?#??L?6+??Be? ?[ 1P?#??/?_d?D?5^ʸ????????k?? s/,???5-???ݭB?N);T??|l?"3]?oHO????-??ܟ???JV??Ƕ?V?ơ??Y?pB???? f??߸ ????T*?nm}?RzI?H?H?? .???????qr?Q?c?? ???c?"?e?B?9??+???L?jg?l??E?b{U|?5?6??!???hS???P;?"˦D??)?c?|??????~?o ۩˒?~?:?4???RL??jq?cL<?R?y?Nr5???>??/??????P??R???wY??y? 1072720 1072720 53 UE?j??V2? ? M?i?~ڏvvC1b??yx??A?g?.?>??^ ?"?,qB ?R???^? ?????"??eQ:Z?{?-? |?0Y?ł n???{s??F??????i.?$q;??5??q?S?1??vw????&VB{??4??J??KJ???7j?h?$??s"???/K?? t:?lNv@?l e? j?r??? ???qJ??_???K????_?Rtڶ??W??Zb<??ڴ?fʭ?&N ??W???s???鄲֮4?6?q?7E?????c?????R?u??????'@???{k?z?+????????O?ljv?O\?ͬ1>??;?T??_?? D???u?H?5?a?)?ھ???9????]???Ҕ??%k?,???ܦe?n???9?$Ku?@ȪS?Lj?zQ?2??cf?<??8?]???צiNqRƏ<I7?*Ӎw5ss9?uH?/\?S|????9?jb>6Kۤ?????C?y84)?? ;???u???N?C?l?@F?<?⡢5rT??B??????}ުp??<?~?w????ZGa4? ?????3???n???@݋?;??2\󷥾???r??!(??2? 5.?F????i??Z^??x?????8 ???a??ޝ㔌c?+? =Ԗ??????<4?????e???-.^?i??:w??J:3d??0ڣ?\n?~?????lN?????G:?hΪ?%u????b֪G??/???6???ƅ???Rc?M??:?K???M?/ ??x$4?| 984740 984740 68 ??}o%?!dЩ?L??i?N?T???XdM?k??F?{?[Q(?<+ê?jҦ?J?5?!?Xp??s?|1u??O?seCI@t?x!?sZ1?l??)M?????3??e?3Gɩ??+?A?? ?i+͒F?Im???t??t?lBhe<?%???f??5?Ir~s??4 ?o??C?|?[?k??e??̓lK#pHX?c?9C?P???˭??!Nu?&wCLM??R9.?H݆[??b;w??̙?R????ٰ?~?l??_w#%?𦷯???c?]] ?IF=5?f? W K??o??о?8??B?u????൑yX2?^J????1?okr#??/??%????*???迼???L?}??XyP9C?pZRK??FS?Ҷ??F????T????Z??ď^I?????^6?K?l????$D~&??j:?}~,F?|$z?@?Z#??Fv????U`???@?>?8??d<????e??[[݃????I?kJ?????[7????????Hk???.????l??aŞ?^Qdc ?7?z????)?N???C??:#?f&kSvK?(???3?h!`?f?U?F?vD?? X?W?U?&????A?m?#??LV??]??R?=?!?=TDg(Eh?C?of?$o?y=??w?{?g????eB?+Њ?.2??d??#?<0???'D<???TB?O???ɉ?|???>[?)???ȬOp?5??}?zfF??Z? r??0?%0w?O ??{??$HR$??ś??~6?ӳ?ŀ?˔<U?P2?,?uo?8DA????Zd(.?֔VF4!1?չ85[??݆?4.i???:???)2Mye??Mh??%?Ly=r??%?K2q?C??ҔLGI?D1??r?vY??w??ͯ??(????????ǘv?/?1ĵvJ%????ώ?.?ց??<)??h???Hk(t߂Om?<?M??Q=?GSps?,d?d}????d[;F ?JD??λ??gW?TՅ???.oi?2h?u,ux?u]?R?cGT?ែgO<U???^?????????? 'm???R?{??U????uA!Vc??<n?#?A??\?+=??/#C? ?w?wH??s?????b??y?⣸??l??y?T0?B??b???8?@ʙ???fs?zzqyOf??vf?tC?l????m?}L?z????qҫ?7>{??&w:?BP???[???%zJ?'~lo?pŠ?G ?h??U???,4?d?fɅ^?+??_?RĎoIe?U??6c ?(s??ҞoV?4@????[??$雙?K??1????R??D?[???Q&A?:]mT^??)???c?"blviK?Pg?f3?C???VsT?68r??????[Wc??zC???? ???c??H?AIJ???{nw????????W?Yɼ%?? ????]??P?????Qľ?c۾?9l?W?ƜܶQ?i???Ђ?l?X!? ?Ԟ?9???bNG"7??????w?ݚ??kX??JF?n@켊{?z???? ??{ ?p??;û?Y,}???o???.?7?/?Ӓl8d.Jt?n5~?h???N?԰,?????P\??y????,????x?;??O??????2A???\?lwѿr??1=?2?<?̺?0m???!FlL?51=?{ɂ_27Q28??<s`o?0C????????j?rK??? ??????HG???@??"P?Xv??qo5?G???v$d?;??ͥ?>??S??ӳ???/F? 986130 986130 76 O??C?s ???7]???M?F?u. }?3?9̄????? ??Mɻ??????m0?\R????"=$?9??=??? ??\?????s??? ©??$3Wei???:?ڡ???JӤC??? ?t?=?r?y\?>Mʹ<?f?????]3'??1?;wǾk??JӧS??Y j??e??? BL??O??%??u"?[ ?vvh?Xv?s?n??|? J?'ۥLr???k35??-???VT??? ?h? jJ??????q?:ߋУ?? x??5??H?(0??)????U ?@P!+]$?o?e?WJ??V??=3??Z???>???t gޱ?&???'GXO[6?kHF????"??l? ??t??-?ԉ?Ys???O;?O????>k??Yc%{K†??@ S? 1???c@??J?Y??'??!??D??C????V.]ptC??k?????*??k?/???0wIQnb•??H????r?>(???$??̨,?>? F`???FC???i%?Oڶ?t]???fR?"?[??a???n?s7?S?ǩ??O?l'? MsNM#??(iA??jܓ?2??ɓ?9???l1??p??nw(??q|I??{?/?˹???Դ?@Ɛ?actb0?P?|?????D?E曍>?l|???,?KΚ?d?E????i???p?P??? ??lHEW??ȴ3??Ӗ?h?d𑣖چ??6g???z^?{??xo??밃xw?w?h??V ??/???u/????????5??H???ڔ$??H?-ǀO???z?$??|bjn???hP??.N?%?2H?5Jh??0i+?] ???|G??aߜ??~F?;?6?\???? ?e?????=??'vXƢ?)r?U??????B?D?~?vAfGu?@????p?Xmq??2?H?y?k??0W#$?m\??g ?{?D?^??;? ?,?t?????pR???!2{?M?s3?.??

yR?O?K?bT???i????hY0?$?????cO??&qX/?^??^$?䎃?Z??-󜎹(?fEEn??f|?-!???3?޳G?!??G&??h

                       ?-?N?G??}?UX>Z?o?5S????f???'f?6?d??P???H3?5>?t?ϐө???O?5>??޷?}??ء????????2?#/;?5??/???k-Yg???y??Qd?h?v8???x|60???|F2Q??;`d???/[|??Uh?Y??R?0l?Q׶???}???y8??я?4"`?y?^?????P?i:?tTpQ?*ڒ0?]?? Z?;H???F*?(ɀ[?r?v^????3}u??b?:???,l???;5ͷ7?TY?????H?xHIg??Mf+B(UK?E??؂U?G=????\??????ҭ?8?}:47???cz???(?9Ɖ5??x<?׵?^}r????xiv???_iL?????0?~??tRnZ?????qװ?????y?px?,

??dT:?@Te?)k???F.?W ??[??????a??JZ? e?9N??XB?=f??<=?]R~.h:]N??p@Ҽ!?\??U??X??ߙ?W???????y????4A??v??,?G??#?S?M?&Mc(??$?P?$???&?u???+r??'J?h???|?]U?qA???.?S?3?ǿ?)???T? ?>r/di?"?41?BG?0u????.???+????#????? X??~??????w.?kԷ!^o>?Ku???g.x#??43?y׾?ҩ35?hf???Kd G8?X<x"??I#?0??b?z???? ????/??KPIbS?X?|?!?v??爎DG~J??Dw?_?8?l??w???͐??.z7???y?弽{??5?????4??=?W??Z!?>{?..󁠄?pO?\!Xo???O??ŀ????⃶k????{/\???vM$?Щ;5??t?????C???uQw?K۷?vs?x??"zס=??C~Fz?A?}?Vnґ߇???*|???l?iٻ?oZH??% .>L?"â4%????,j?lbl?y????a]?L2b> ??$?$??ڂ?????նKι???$????q/?5????պ??'?Ɣz??$K??c:vL?a黴;????R?M??Q?S?9????풌?⋒????"?K????y?Ta?:r?58?ۉ?????????>?)?I??ⅹ+4?\?L?+!E?/?@y???c^?Nl??-i?W???i% .?R???e?)$l?PT$'d???Iõ???4?y?zXm?|???C???g?<?}??Wi??n????Z??8?m{='???|???;???Q???????uhd??J ??J?ǷJN?+BJ?ѐ?յ??>ē8,?3??h ڒs6?~ ?h??I6?00 79 Hm"T~?ƈ?2?Q4??9?.??w?+?"l??ૻ?׆/?????J?/=???-><מ?$b??w?0y ͰLPӲ?C? k@???Hl?%?\?Y?I?E씸O?jύ????n?0(?|?2Rot???spi?N??q? Pm? dXʂ^Jn!?(X??lұ3??? b?^w?(?с?k1????f????v??ˠC $MG?y??u???S???S?䓶??i?<P?J;?vNz%c??&?S v???#?<fVN7w? ???k?Lj?^??)p?b??l+\7Dׁ=????s?'U??R? + / ??`Ls??! ?FPV???R/? KQYs?N/Er3?????4????O MV0v?n?kj ??8vq?=??S)b/ɵ?:?&???6?.?#C\????^A ?N?hU2|3֖V涷???s??K7??Kz$ ? ???)Y49y?D8????X-EEn ??)1s+??Z?3Њͣ1.+?/v??7?Z??+|P<=?-+?mo?Z?????_u?ڃ,??ŀ??/N?3i1]9??3Q +??)@H?B??G?awuL?w{E? ?G?{*zq??r^?h<??Ag4&׀??c܊???OeO?v???????elO?S??#?q?c??N??$?iK{??t?8?M???֩8?ij????C?1?r;?l?s?|???\??)͚??????t?????Գ?9ѣKk?zG???????(??nZ-?7?i2u??)(A??6A*]#?tf???^?T[n"X??7?g?&?. ۤg?I ?wb@?6??y?)??b=??<n?ü?5O?~??,??^?????: ????9>?=?N??7չ??Q{=??r6??_??? ?@?*?@??و?m?{?j?&??1(5?o2???ݹ6??vqFM??O?é?#d>I?]?????~蹋~Z?f?$??#?ŷ8]??4,??W{M?ɰ???Ч ??? ?awn?m@<??????? ?<??p??+G*9?Z???? ??Z??P?Ƥ@ް"?ȃ?? ?P?G w??aX9, ??##?L???`ݷ?÷F?r??V???6V?ݷ??K??P? [;?Bb? B??

?@? 5P?qNP9VdKثw??,X?憽??Z???z?R??Ҽ$㿊??ӣA8DO<P???:?9?#??bXF6??"ͮ?s??(@?q???l???sX??.m??' qx????׋!?\OK9?dT"YM????M4??O????Y??O?2???n?h1b?ox???]z??$D???茌? ??2@VdV??D[??5^$<?F?? ????s7b???0?l]?R?iח0????zI?X?|?G??3B"?X?/??$?q&"IQ?z,???? 𜌆?????;???>zO??7???{y3K???z??@g??@? ??????,D? @?e??2?? GEbA? ???8gO???>?(sb?=^1bv?z? ?B##d?x??ĀH??zH?<?Q???W???'????3ǥ]??5x?8?/J?j???z|??4ɷ????)8?z??r?߯??????[??A????8????????o43????(??WrF???\z??\??3 ????E^?T???}<?Qp?????ߝ??1???? ?>?e(:?????a?@?@o???(?? ??Ib?FQ ??w®#Zi0?7?,??R??N? ??wB?*?@M??????^DZ?}?s?%?٣H<ir?'?

I'm using PE-125bp with 4bp UMIs

baraaorabi commented 4 years ago

Oh, you need to add -g to use gzipped input (it's on the --help but not on the README!)

baraaorabi commented 4 years ago

Added to README now.

Also, about multi-threading, Calib runtime does not scale well with more threads. If you multiple samples run them in parallel but each on a single thread. Also, if you want a bit of speedup, run with --no-sort option (Calib cons module doesn't need the cluster file to be sorted).

ChadFibke commented 4 years ago

That seems to be working, thank you :)

I ran it quickly on my negative control and I get the expected cluster tsv file (the larger data sets are still waiting in a queue)! Thanks for the additional comments on Calib's scalability. I'll close the issue and will reopen in something goes haywire over the weekend!

ChadFibke commented 4 years ago

Alas, I've hit another snag.

I was able to generate a proper cluster file with the additional -g flag, which results in:

head test.cluster

1654325 3012539 9 @HS27_336:2:1101:2628:2159 NACTGGGCCCAGCTTGCTAGACAAATAGGAGCCAGCCTGAATGATGACATTCTTTTCGGGGTGTTCGCACAAAGCAAGCCAGATTCTGCCGAACCAATGGATCGATCTGCCATGTGTGCATTCCC #<:??GDGGGGGGFGGGCEGFGCEGGGGGGGGGGGGGGGGGGGGGGFGGGGGGEGGGGGDGGGGGGGGGGDGFEFGGGGGGGGGGGG>GECGGGGGGF8FGGGGGGGGGGGGGDDDD<D=GGGGG @HS27_336:2:1101:2628:2159 CGAGTCATTGTTTTTGTTGACGATCTTGTTGAAGAAGTCGTTGACATATTTGATAGGGAATGCACACATGGCAGATCGATCCATTGGTTCGGCAGAATCTGGCTTGCTTTGTGCGAACACCCCGA A?ABBGGGGGGDGGGGGGBGGGGGGGGGGGBEGEGGGGGBFGGGGGGGGGGG1FGG>FGGDGGGGGGGGGGGGGGFDFFDFGGDGBFG>FGCD>GGE>F@GGGGGGGGGGGGGG<FGGGGBGGGG 1073675 1462219 87 @HS27_336:2:1101:10913:2207 AGACTCGCCCGGCTAATTTTTGTATTCTTAGTAGAGATGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCTTGACCTCATGATATGCCCGCCTCGGCCTCCCAAGTGCGGGGGTTACAGC 3<<AFE<//;EBBGEGGGEGGG>FG>C<11=CF1E1C11:9//<:<:FGGFFD10=1EFGB@0FB11:1=FGG@GCC0DG>B@GGGGD00=/;ECC.C8?@0FBB@08;9C..6CDGG/9@ @HS27_336:2:1101:10913:2207 CTACTGTAGGAACAGGAAGAAGCTGCCAACAGCCCACAGGCCCAGCACAGAGGAAGGGAAGCTAACTAACCCTGGAACGTCGGGACAATGGGAGTAGCTAGGGCCTGAGTGGATTCGAAAATAAA ?BBBB1>FFDDF1;F0>=FDGGGCGCGDGGE1F>FGGG0BDCBBFGG1FG11B11:FG0C>FGGEDGGGC0:=E::>BFD@CGGEFGG0E@G=GGGCFFD@GE..FGGFGGDDGGGGGBBGGE 610675 5225094 89 @HS27_336:2:1101:10830:2217 CCTGTTGCTCAGCACCCGGGCTGGGGGGCTCGGCCTGAACCTCCAGTCGGCAGACACTGTGATCATTTTTGACAGCGACTGGAATCCTCACCAGGTAAAAGCGGGCCGGGCCCCAGGTCGAGGAG :>>@0B;FFFGBFGGGGGGDGGFGGDGGGGGGGGGG/DBGGGGGGGGE>CGGGGGEE=GGGGGGE=DGGGGGG0CECGGACGGGDGEGEGGGGGD@GG/@CE?.CAG;A;;>ACC>B/CACADCC @HS27_336:2:1101:10830:2217 GCTCTGGACCCTCAAGCCAGGGCCGTCTCCTCGAGGTTTTGCAGGCACCCCCTTCCTTCTCCTCGACCTGGGGCCCGGCCCGCTTTTACCTGGTGAGGATTCCAGTCGCTGTCAAAAATGATAAC 3A=BBGGEEBGGGGGGGGG0FFGGG/EEGEGBF>FDFGEGE>=:FCFEGGGGG>EFF1EF@F1FG<EEG00@>/C9CBCADDG<.CF<8@@DGGG=EG//8/68EGG6CGDD@/68/6@/8D/CG 922300 1441903 105 @HS27_336:2:1101:12402:2163 NTACTCGGGGCCAATAGCCCCCCACCTGACACCCCCAACCCTACATTTCTGCACAAAAGCCCCGCCTCCCTGGGGCTGCGGGCAGAGGGATGAGGCTCCCACCTTTCAGCAGGTCAGAGAGCGGG #<?@BFGBGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGEGGGGGGGGGGGGGGGGBGGDBGGGGGGGGG<D<GGGGGGGG.@G@EGGGGGGGGE@@DGGGGGGGCGGGEGGDEGGG @HS27_336:2:1101:12402:2163 ATTCTCTTGACTGACCACGCCTTTCTTCCCTCCCCTCGAAATGAAGCTACAACATCACCACGGGTCTGTACCCCTTCGAAGGGGACAACATCTACAAGTTGTTTGAGAACATCGGGAAGGGGAGC BBBBCGGGGGGGGGGFGGGGGFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@GGGGGGGGGGGGG0FGGFGEGBGGGGGGEGGGGGGGGGGGGGGGCFGGGGGCGDGGGCCA 708225 3408142 115 @HS27_336:2:1101:13262:2189 GAGTTTCTCACTGATATCGAATGCAATGGATGATCTGGGAAATAAGAAGAATTTATGGTATTGCCTACAAAGAAGTTGATGAACCGGTCCTTTACAGATGAAAGGACTTTGGCTCCCAGGGCGCT :@BBBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGECGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEEFGGGBBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF @HS27_336:2:1101:13262:2189 GGTGTTAGAAGAGCCCAGCCAGTGTCCTGACTGTGTGGTGAGCGCCCTGGGAGCCAAAGTCCTTTCATCTGTAAAGGACCGGTTCATCAACTTCTTTGTAGGCAATACCATAAATTCTTCTTATT ABABAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 1672350 2582257 144 @HS27_336:2:1101:16400:2184 NCAGTGTGTGGAGGAATTACATTCACCTCTTCATCAAGGTTACTTTTTCGTGGTGTTCTCTGTGTTTCAAAACTAAATAACAATAAGTGAAGTCATTCACATACTGAAAATTTACAATTTGTGCT #<<>AGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGCGGGGGGGGGGGGGGGGGGGGG>EGGGGGGGGG>FGGGGGGGGGGGGGFGGGGGEGGGGGGEGGGG @HS27_336:2:1101:16400:2184 CTCTATGATTTTATGAGACAACAGAAGCATTATACTGCTTTTTTGATGCATAAAGCACAAATTGTAAATTTTCAGTATGTGAATGACTTCACTTATTGTTATTTAGTTTTGAAACACAGAGAACA CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGDEFCCGGGGCFGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGG<EGGGGGGFGGGGG 446900 2812708 152 @HS27_336:2:1101:16734:2170 NCGTCTCAGAGATAACCAATACATTACCACATCTGACTTGGTGGTAAACTTTTGAGTTTGCAGACTTTCCAAAGCCATCCACTTCACTGGCAGCTTTGCACCTGTTTTGTTGTGTACACTATAGT #=@BGEGGGGGGGGGGGGGGGGGGGGFGGGGG@FCDGGGGGGGGGGGGGGGGFFGGGEFGGFGGGGGGGGEGGGGGGGGGGGGGGGGGFGGCGGGGGGGGGGGFCG;FGEGGGFDGGE0DFGG @HS27_336:2:1101:16734:2170 ATGTTATTTCAGCCACGGGTAATAATTTTTGTCCTTTCTGTAGGCTGGATGAAAAATTCACAGTCAAGGTTGCTGATTTTGGTCTTGCCAGAGACATGTATGATAAAGAATACTATAGTGTACAC CCCCCGGGGGGGGGGGGGGDGGGGGFGGGGGFGGGGEGGGGGGGGGGGEGGGGGGFGGGGGFGGGGCCFGGGDFG>FGGFGGFGGGGGCGFGGGGGGGGGGGGGGGGGGGGGGFGGGGFGGGGGG 545275 2847558 155 @HS27_336:2:1101:17100:2147 NCAGTCATTTTTGTTGGTGTTGGCAGACCTTCTGAAATTTTATATGGACTCTTCAGGGGTGAAATATAGATGTTCCCTCCAGGAATCCGTAAGGGTGAACTAGGAAACTTGTAAGGGCTTCGAGG #<<@AFFGGGGGGGG1FEEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGCFGGGGGGGGGGGEGGGGFGGGGGGFFG @HS27_336:2:1101:17100:2147 TGACTTAGCCCCCTACCTTGTCACCAATACCTCACATTCCTCGAAGCCCTTACAAGTTTCCTAGTTCACCCTTACGGATTCCTGGAGGGAACATCTATATTTCACCCCTGAAGAGTCCATATAAA CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG>DGGGGGGGGGGGFGGGGG 480225 4411453 210 @HS27_336:2:1101:2484:2389 AGACTTGGGCCTGGCCACATGCCCAGCAAGAGTCCCCATCCTAGCCCCTTGTGGACATAGGGGTTTGCTCCGGAGAGACCTGCAAAGAGCCCAGGTGCATACCTTGGCAATCTGCATACACCAGT 33BA>GCFFBGDBGGGGGGGDGG>GGGGGGGGDCGGGGGGGGGEF@>GE@G19FF1FGGEFGGCG1FGEGFDGGGGGGGGFGG8DG>>FFEBFFGGG>FF@=F8FF=FEGGEGGEG@F6EBG@ @HS27_336:2:1101:2484:2389 TAAGTGCCTTCTGGGCATCTGCCTGACATCCACGGTGCAGCTGGTGACACAGCTTATGCCCTATGGCTGCCTCTTAGACCATGTCCGGGAAAACCGCGGACGCCTGGGCTCCCAGGACCTGCTGA A3AAGEGGCGGGC1FFFGGC>FCDBDGGG>FFGGEGGGGG>@GGEGGEGGGE11FG@FFG>FGGGGGBGC1DG>11:FGCGGGGBGG@8FGG<EGGGGGGG;E<6DDEGGGG/CEGGGBG=EC 1195950 2824450 347 @HS27_336:2:1101:10867:2304 CACTGGCCCAGGTCTCACCAGGCCGCTACCCGGGCCACACACCACCCCTCTGCTGGTCACACCAGGCTGAGCCAGTGACCGCTGCTGCCTGGCCATGGCCTGACAACTCGTGCTATTTTTCCTCA 3>3:0>F00CCDFDGEGCGGGGFGCGA/CFFGGGGGBDGBBGGGGDGGEGGGGGEGGGGGCGG>BFBB@GGEGGG@FDG0CG>DCEG=FGGGGGGGGEGGGGGGB/.CDGGGB.8C@=@GGGBDD @HS27_336:2:1101:10867:2304 CCAGTATCTTTCCTAGGCTTCCCAAGGGCACTGCCTGCCCCATGGTGCACCTGGGATCCCTGGGAGCCCCGCCTCATCCCCGGGACTGGGCACCTGGCTCCTCTTCACGTAGGAATCCTCTTCAT B3A0AGGGFEF>1E1BDGEGGG>GD>0CFGGG@FGDFGGFGG1111?DF>CGGEG:1::BF@DEGGFFDGEEGGDFCGGGAGF.<CD@GFFGGGB=0FGGGGGC/6C8EDB=DG=GGGGGEDB

Then I was trying to pass the cluster file to calib_cons with the following command:

calib_cons -q 1.fastq 2.fastq -o 1.out 2.out -c test.cluster --min-reads-per-cluster 1

Reading cluster file: test.cluster Reading fastq file: 1.fastq Writing output files: 1.out [spoa::Graph::add_alignment] error: empty sequence!

I'm then left with the following files in the working DIR:

1.out.fastq 1.out.fastq0 1.out.fastq1 1.out.msa1 test.cluster

All files are empty except the test.cluster file :/. I have tried the calib_cons command with the various different examples you've provided in the --help section, which result in the same message. Am I making another silly mistake?

baraaorabi commented 4 years ago

How did you generate the 1.fastq and 2.fastq files? They are supposed to be ungzipped versions of test_R1.fastq.gz and test_R1.fastq.gz. Can you also head them?

ChadFibke commented 4 years ago

I definitely miss-interpreted the space separated FASTQ list comment. I thought since the sequences from the fastq.R1 and fastq.R2 were already present in the cluster file the original fastq files were no longer needed. I used the name of the unzipped test_R1.fastq.gz and test_R2.fastq.gz and calib_cons successfully finished. Thanks for all the help!

sandmanns commented 4 years ago

Hi @baraaorabi, I was wondering if you maybe had the time to give the error correction module another try with conda? I am using Calib and the first analysis step works really well. However, I am struggling with the installation of the error correction module for 3 days now. While I could install Calib without any problem (I tested both, conda and git), the error correction always fails. It seems openssl/md5.h cannot be found (I am working on Linux). I tried everything I could possibly think of. It should be there, but still it can't be found. So, it would be really great if there was a conda-version for the error correction module :-)

Best, Sarah

baraaorabi commented 4 years ago

Hi @sandmanns!

Please give the latest conda release a try. It should now include the error correction module. Let me know if it works!

sandmanns commented 4 years ago

Perfect! It works. Thanks a lot!!