ucagenomix / sicelore-2.1

MIT License
13 stars 2 forks source link

java.lang.NumberFormatException in SelectValidCellBarcode #1

Closed yuntianf closed 1 year ago

yuntianf commented 1 year ago

Hi, I appreciate your effort for such great pipeline, but I met a java.lang.NumberFormatException in SelectValidCellBarcode when I tried this pipeline on a Nanopore long reads sequencing dataset based on 10X 5' toolkit. Below is the error information:

**********    SelectValidCellBarcode -I BarcodesAssigned.tsv -O barcodes.csv -MINUMI 1 -ED0ED1RATIO 1
**********

[Tue Nov 22 11:31:11 EST 2022] SelectValidCellBarcode INPUT=BarcodesAssigned.tsv OUTPUT=barcodes.csv MINUMI=1 ED0ED1RATIO=1.0    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Nov 22 11:31:11 EST 2022] Executing on Linux 3.10.0-1160.62.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 19.0.1+10-21; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: null
java.lang.NumberFormatException: For input string: ""
        at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
        at java.base/java.lang.Integer.parseInt(Integer.java:675)
        at java.base/java.lang.Integer.<init>(Integer.java:1120)
        at org.ipmc.sicelore.programs.SelectValidCellBarcode.doWork(SelectValidCellBarcode.java:67)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:303)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at org.ipmc.sicelore.cmdline.SiCeLoReMain.main(SiCeLoReMain.java:31)
INFO    2022-11-22 11:31:11     SelectValidCellBarcode  Total cell barcodes             [266]
INFO    2022-11-22 11:31:11     SelectValidCellBarcode  Valid cell barcodes             [263]
[Tue Nov 22 11:31:11 EST 2022] org.ipmc.sicelore.programs.SelectValidCellBarcode done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=81133568

Here I also attach a part of the input file BarcodesAssigned.tsv to help debug:

Barcode n Reads with ED<=1 match        ED=0    ED=1
AACTGGTCAATGGTCT        3,811   3,548   263
AGATCTGAGTAGGCCA        2,606   2,223   383
CTCGGGACAGCTTAAC        1,990   1,486   504
CACACCTCATTGGGCC        1,821   1,505   316
GGGACCTTCTCCTATA        1,736   1,361   375
CGGAGTCCACAACGTT        1,658   1,430   228
ACATGGTCATGCAACT        1,619   1,425   194
TTAACTCTCCTAGAAC        1,439   1,268   171
CGGTTAACAATGGAGC        1,279   1,132   147
TTGGCAAAGTGGAGTC        1,251   1,041   210
GACGCGTTCAGTCAGT        1,173   1,026   147
TTCCCAGTCTAGAGTC        1,154   989     165
TAGTTGGTCACGATGT        1,121   1,033   88
TACTCATTCCGTTGCT        1,047   929     118
AAACGGGGTTATGTGC        1,046   765     281
CTTTGCGTCGGAAACG        1,017   840     177
GCAAACTCACTGTTAG        998     860     138
TTCGAAGTCCACGACG        997     827     170
AGGCCGTGTGATGCCC        929     801     128
TTGGAACGTCTGCAAT        859     623     236
GTTACAGAGGAGTTTA        858     716     142
CGGACTGTCGAGAACG        855     702     153
GCTCCTATCTTAGCCC        843     707     136
CGATGGCCATTGAGCT        836     743     93
CATGACATCAGTACGT        821     750     71
CTTACCGCATTATCTC        813     685     128
CTACACCCAAGCGATG        782     679     103
TCTCTAATCACGACTA        757     630     127
TAAGTGCTCGAATCCA        706     606     100
CTCGGAGCATTCTCAT        703     541     162
TTTCCTCGTAACGCGA        701     578     123
CGGACGTAGGGCTCTC        699     560     139

I appreciate it if you have any thoughts to fix this problem! Thanks!

ucagenomix commented 1 year ago

hi,

could you check the new Sicelore-2.1jar file doesn't give this error anymore. you need to re-clone the github to test on your system.

kevin

yuntianf commented 1 year ago

Hi Kevin, Thanks for your quick response, I re-cloned the github and re-tested it with above BarcodesAssigned.tsv, it still gave me the same error, but kind different output:

INFO    2022-11-22 13:56:43     SelectValidCellBarcode  

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    SelectValidCellBarcode -I BarcodesAssigned.tsv -O barcodes.csv -MINUMI 1 -ED0ED1RATIO 1
**********

[Tue Nov 22 13:56:44 EST 2022] SelectValidCellBarcode INPUT=BarcodesAssigned.tsv OUTPUT=barcodes.csv MINUMI=1 ED0ED1RATIO=1.0    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Nov 22 13:56:44 EST 2022] Executing on Linux 3.10.0-1160.62.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 19.0.1+10-21; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: null
INFO    2022-11-22 13:56:44     SelectValidCellBarcode  CCCAGTTTCCTTGCCA barcode removed                [CCCAGTTTCCTTGCCA       8       3       5]
INFO    2022-11-22 13:56:44     SelectValidCellBarcode  AGGTCCGAGCTGATAA barcode removed                [AGGTCCGAGCTGATAA       6       2       4]
INFO    2022-11-22 13:56:44     SelectValidCellBarcode  TCTGAGAAGGAGTACC barcode removed                [TCTGAGAAGGAGTACC       3       1       2]
java.lang.NumberFormatException: For input string: ""
        at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
        at java.base/java.lang.Integer.parseInt(Integer.java:675)
        at java.base/java.lang.Integer.<init>(Integer.java:1120)
        at org.ipmc.sicelore.programs.SelectValidCellBarcode.doWork(SelectValidCellBarcode.java:67)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:303)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at org.ipmc.sicelore.cmdline.SiCeLoReMain.main(SiCeLoReMain.java:31)
INFO    2022-11-22 13:56:44     SelectValidCellBarcode  Total cell barcodes             [266]
INFO    2022-11-22 13:56:44     SelectValidCellBarcode  Valid cell barcodes             [263]
[Tue Nov 22 13:56:44 EST 2022] org.ipmc.sicelore.programs.SelectValidCellBarcode done. Elapsed time: 0.00 minutes.
ucagenomix commented 1 year ago

hi, the better would be that i have access to the BarcodesAssigned.tsv file. I've changed the code, hope this time it will works, give it a try and if not, please copy/paste your file.

best,

yuntianf commented 1 year ago

Hi Kevin, Thanks, this time it works.

INFO    2022-11-23 09:57:25     SelectValidCellBarcode  

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    SelectValidCellBarcode -I BarcodesAssigned.tsv -O barcodes.csv -MINUMI 1 -ED0ED1RATIO 1
**********

[Wed Nov 23 09:57:26 EST 2022] SelectValidCellBarcode INPUT=BarcodesAssigned.tsv OUTPUT=barcodes.csv MINUMI=1 ED0ED1RATIO=1.0    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 23 09:57:26 EST 2022] Executing on Linux 3.10.0-1160.62.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 19.0.1+10-21; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: null
INFO    2022-11-23 09:57:26     SelectValidCellBarcode  CCCAGTTTCCTTGCCA barcode removed                [CCCAGTTTCCTTGCCA       8       3       5]
INFO    2022-11-23 09:57:26     SelectValidCellBarcode  AGGTCCGAGCTGATAA barcode removed                [AGGTCCGAGCTGATAA       6       2       4]
INFO    2022-11-23 09:57:26     SelectValidCellBarcode  TCTGAGAAGGAGTACC barcode removed                [TCTGAGAAGGAGTACC       3       1       2]
INFO    2022-11-23 09:57:26     SelectValidCellBarcode  Total cell barcodes             [272]
INFO    2022-11-23 09:57:26     SelectValidCellBarcode  Valid cell barcodes             [269]
[Wed Nov 23 09:57:26 EST 2022] org.ipmc.sicelore.programs.SelectValidCellBarcode done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=81133568