miRTop / mirtop

command lines tool to annotate miRNAs with a standard mirna/isomir naming
https://mirtop.readthedocs.org
MIT License
18 stars 21 forks source link

gff from sRNAbench output - invalid literal error #77

Open jonahcullen opened 1 year ago

jonahcullen commented 1 year ago

Expected behavior and actual behavior.

I am attempting to use mirtop gff from the output of sRNAbench. I expect a GFF to be returned. I am able to get this to work with the output from miraligner.

Steps to reproduce the problem.

mirtop gff --format srnabench --sps eca --hairpin hairpin.fa --gtf eca.gff3 -o HERE ../LocalTEST/

returns

04/11/2023 03:21:21 INFO Run annotation
Traceback (most recent call last):
  File "/opt/conda/envs/small/bin/mirtop", line 10, in <module>
    sys.exit(main())
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/command_line.py", line 31, in main
    reader(kwargs["args"])
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/gff/__init__.py", line 49, in reader
    out_dts[fn] = srnabench.read_file(fn, args)
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/importer/srnabench.py", line 47, in read_file
    source_iso = _read_iso(reads_iso)
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/importer/srnabench.py", line 169, in _read_iso
    iso[(cols[0], m)] = _translate(anno[m], cols[4])
  File "/opt/conda/envs/small/lib/python3.10/site-packages/mirtop/importer/srnabench.py", line 206, in _translate
    iso.extend(_iso_snp(int(nt.split(":")[0])))
ValueError: invalid literal for int() with base 10: '-$16'

Specifications like the version of the project, operating system, or hardware.

I am using mirtop (0.4.25) and sRNAbench.jar (2.0) on a university HPC running CentOS Linux 7.

Thanks for your time, Jonah.

jonahcullen commented 1 year ago

Apologies I should have looked a little closer and reported the isoLabel that is causing the issue - -$16:G>A,19:T>C,20:G>A with the full line (excluding RPMs):

TGGAATGTAAGGAAGTATGCAG  eca-miR-1$eca-miR-206   eca-mir-1-2$eca-mir-1-1$eca-mir-206-2   nta#G|nta#G#1$NucVar    -$16:G>A,19:T>C,20:G>A
lpantano commented 1 year ago

Thank you for submitting this error. Could you share the hairpin file, and the GFF file. I can try to debug with those and the line you identified as problematic.

lpantano commented 1 year ago

Do you know what the - symbol would mean there?

jonahcullen commented 1 year ago

Thank you for your response! I do not know what - means here, it occurs rarely along side other variants (e.g.18:A>G) but always causes an error when it is the first one listed. That same column (sequenceVariant) contains - with no other variants as well. I'm guessing the update from sRNAbench v1.2 or v1.6 to 2.0 is what is causing the issue. For example, exact no longer occurs any in the microRNAannotation.txt file.

I've attached the eca3.ens_mirtop.gff.txt, hairpin.fa.txt (miRBase v22 filtered to include only eca), and the microRNAannotation.txt files. Apologies I had to append the fasta and GFF with .txt.