marschall-lab / gaftools

General purpose utility related to GAF files
https://gaftools.readthedocs.io/
MIT License
11 stars 0 forks source link

Gaftools stat does not work with a GAF file as input #20

Closed fatimazahraabani closed 7 months ago

fatimazahraabani commented 7 months ago

When I run gaftools stat to get statistics on my gaf file, it gives me this error and I get the impression that the problem is in the code and there's nothing I can do about it :

Traceback (most recent call last): File "/usr/local/bioinfo/miniconda3-23.10.0-1/envs/gaftools-0.1/bin/gaftools", line 8, in sys.exit(main()) File "/usr/local/bioinfo/gaftools-0.1/gaftools/main.py", line 88, in main module.main(args) File "/usr/local/bioinfo/gaftools-0.1/gaftools/cli/stat.py", line 159, in main run_stat(*vars(args)) File "/usr/local/bioinfo/gaftools-0.1/gaftools/cli/stat.py", line 51, in run_stat for alignment_count, mapping in enumerate(parse_gaf(gaf_path), 1): File "/usr/local/bioinfo/gaftools-0.1/gaftools/gaf.py", line 66, in parse_gaf query_start = int(fields[2]) ValueError: invalid literal for int() with base 10: ''

asylvz commented 7 months ago

I guess you using the latest commit? How did you generate your GAF file, with which tool? Also if it's possible, can you paste a few lines of it here?

fatimazahraabani commented 7 months ago

I believe this is the latest commit. I just installed it a few days ago. The GAF file is generated by an alignment with VG Giraffe. You'll find it a few lines down. HWUSI-EAS454_0026_FC:2:1:13137:1176#ATCACG/1 76 0 76 + <15616590<15616589<15616586<15616584<15616583<15616581<15616578<15616577<15616576 114 17 93 76 76 60 AS:i:86 bq:Z:HIIGIGGGGFDDGGGDIIFFEBEIIHHFHAHIIHIHDIIHIIHIFGIDIGFIIIFIIGHDFHEHIC@CBCHGCHGC cs:Z::76 dv:f:0 fn:Z:HWUSI-EAS454_0026_FC:2:1:13137:1176#ATCACG/2 pd:b:1 HWUSI-EAS454_0026_FC:2:1:13137:1176#ATCACG/2 76 0 76 + >15616566>15616568>15616569>15616570>15616571>15616575>15616576 123 0 76 76 76 60 AS:i:86 bq:Z:IIIIIIIIHGHHHIIIBIIIEGIBIIGIIIHIIIGBIGHGIIIIIIBIIIGDGGGGGEGGEGFGIIIIHIEIIHII cs:Z::76 dv:f:0 fp:Z:HWUSI-EAS454_0026_FC:2:1:13137:1176#ATCACG/1 pd:b:1 HWUSI-EAS454_0026_FC:2:1:13334:1176#ATCACG/1 76 0 76 + <3252944 412 45 121 76 76 60 AS:i:86 bq:Z:HHHHHGEHGDHHHHHHHFHGHEHHHGGDHHHGHEHHGHHHHHHHBDEGDH>GDGBFHHHDFDHDFHGEHHHHFHHB cs:Z::76 dv:f:0 fn:Z:HWUSI-EAS454_0026_FC:2:1:13334:1176#ATCACG/2 pd:b:1 HWUSI-EAS454_0026_FC:2:1:13334:1176#ATCACG/2 76 0 76 + >3252944 412 106 182 76 76 60 AS:i:86 bq:Z:IIFIIGIGIGIIIIIGIIEIIIIIIIBIG@EGAGEGDGGGGGDEGEDDDGGCEIEFGDGGEGECDHIGHHIIHDGC cs:Z::76 dv:f:0 fp:Z:HWUSI-EAS454_0026_FC:2:1:13334:1176#ATCACG/1 pd:b:1 HWUSI-EAS454_0026_FC:2:1:16110:1176#ATCACG/1 76 0 76 + <13657604 1024 219 295 76 76 60 AS:i:86 bq:Z:GGGGDGG=@D:?=BBD>:=@GGGGADGGE;GAGG:DGGD@GFEEF8F2FFBABEEDFFECGEG,DGDD>DDAGG,E cs:Z::76 dv:f:0 fn:Z:HWUSI-EAS454_0026_FC:2:1:16110:1176#ATCACG/2 pd:b:1

asylvz commented 7 months ago

I actually don't see any issue, and gaftools parses these lines fine. It seems that the second column of one of the lines is not an integer or something else triggers this, I'm not sure. If you are able to share the gaf file, I can look into it in more detail.

fatimazahraabani commented 7 months ago

Oh yeah I didn't try with an excerpt, that's nice of you I'll email you my entire gaf file if you could see it

fatimazahraabani commented 7 months ago

I'm sending you the download link for my entire gaf file in this e-mail.

https://filesender.renater.fr/?s=download&token=19edc25e-dd28-4f43-93a8-e74415a13416

Fatima-Zahra

Le ven. 5 avr. 2024 à 15:50, Arda Soylev @.***> a écrit :

I actually don't see any issue, and gaftools parses these lines fine. It seems that the second column of one of the lines is not an integer or something else triggers this, I'm not sure. If you are able to share the gaf file, I can look into it in more detail.

— Reply to this email directly, view it on GitHub https://github.com/marschall-lab/gaftools/issues/20#issuecomment-2039856879, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4SYHEISLFTRZIUZ6AH6VLTY32T4DAVCNFSM6AAAAABFZENHM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZZHA2TMOBXHE . You are receiving this because you authored the thread.Message ID: @.***>

asylvz commented 7 months ago

Thank you so much for reporting this. Your GAF has some lines with "*", which I guess are the unmapped reads. That was causing the issue.

Additionally, I see some alignments with negative value in the 9th column, which I guess should be positive. E.g.,

HWUSI-EAS454_0026_FC:3:21:11831:16846#ATCACG 76 0 76 + >1416628>1416630 167 96 -56 69 76 24 AS:i:66 bq:Z:FEFCFDDFFD8EFFFHFEDIIDDDGF-FBBDD:D?EBBB@>GDGGEGGGGIDIHH?C38CC?-<@GDBBG###### cs:Z::26*TG:35*TC:8+GCGCC dv:f:0.0921 fp:Z:CJP75M1:362:C20PVACXX:7:2309:20935:50973

Now with the latest commit, we handle such alignments. Please let me know if you encounter another issue.

Arda