maedat / GFF2MSS

GFF2MSS; GFF3 converter for DDBJ submission via MSS
MIT License
7 stars 2 forks source link

ValueError: invalid literal for int() with base 10: '.' #14

Closed kfuku52 closed 1 year ago

kfuku52 commented 1 year ago

This error seems to happen when CDS features do not have strand info in gff's 8th column. This is a problem with the input gff file rather than the program itself, but I am sharing the info here for people who encountered the same error.

GFF2MSS.py --fasta /Users/kf/Dropbox/data/GENUS_SPECIES/genome_for_publication/GENUS_SPECIES_male_HiC.fa --gff sorted.gff --ann GENUS_SPECIES.annotation.tsv --loc Nepgr --nam GENUS SPECIES --stn male --gty proximity ligation --gct 1 --out Nepgr.mss.txt 

new_contig
Processing scaffold2
Gap finding
Gap find end
GFF Processing
Traceback (most recent call last):
  File "/Users/kf/Dropbox/repos/GFF2MSS/GFF2MSS.py", line 468, in <module>
    locus_tag_counter, OUT_CHA = GFF_TO_CDS(gff_df_col,in_file, NowContig, locus_tag_counter, anno_DF, pid_DF, OUT_CHA, GAP_DF)
  File "/Users/kf/Dropbox/repos/GFF2MSS/GFF2MSS.py", line 414, in GFF_TO_CDS
    OUT_CHA = mRNA_MAKE_NP(gff_df_col_F, rec_sub, locus_tag_prefix, locus_tag_counter, anno_DF, pid_DF, OUT_CHA, GAP_DF)
  File "/Users/kf/Dropbox/repos/GFF2MSS/GFF2MSS.py", line 304, in mRNA_MAKE_NP
    gff_df_col_F_sub_sub = gff_df_col_F_sub_sub.astype({'phase': int})
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/pandas/core/generic.py", line 6226, in astype
    res_col = col.astype(dtype=cdt, copy=copy, errors=errors)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/pandas/core/generic.py", line 6240, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 450, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 352, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 526, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
    values = astype_nansafe(values, dtype, copy=copy)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/pandas/core/dtypes/astype.py", line 170, in astype_nansafe
    return arr.astype(dtype, copy=True)
ValueError: invalid literal for int() with base 10: '.'
kfuku52 commented 1 year ago

Just in case if someone wants to fill the CDS frames in a similar situation: https://gist.github.com/kfuku52/96612cb8a78562a6ae9c9e38caa4a355

maedat commented 1 year ago

Thank you, @kfuku52 ! Your contribution was seriously a lifesaver! I also wanted to mention that the program was made by ChatGPT. It's pretty darn impressive. The fact that an AI program can help us bioinformaticians to develop better scripts is a breakthrough.