padlocbio / padloc

Locate antiviral defence systems in prokaryotic genomes
MIT License
46 stars 9 forks source link

GFF issue or other issue? #48

Closed raw937 closed 9 hours ago

raw937 commented 3 days ago

Warning messages: 1: package ‘tidyverse’ was built under R version 4.3.3 2: package ‘ggplot2’ was built under R version 4.3.3 3: package ‘tibble’ was built under R version 4.3.3 4: package ‘tidyr’ was built under R version 4.3.3 5: package ‘readr’ was built under R version 4.3.3 6: package ‘purrr’ was built under R version 4.3.3 7: package ‘dplyr’ was built under R version 4.3.3 8: package ‘stringr’ was built under R version 4.3.3 9: package ‘forcats’ was built under R version 4.3.3 10: package ‘lubridate’ was built under R version 4.3.3 Warning message: One or more parsing issues, call problems() on your data frame for details, e.g.: dat <- vroom(...) problems(dat) Error in separate_wider_delim(): ! Expected 2 pieces in each element of attributes. ! 1 value was too short. ℹ Use too_few = "debug" to diagnose the problem. ℹ Use too_few = "align_start"/"align_end" to silence this message. Backtrace:

  1. ├─gff %>% filter(type == "CDS") %>% separate_attributes()
  2. ├─global separate_attributes(.)
  3. │ └─... %>% ...
  4. ├─tidyr::pivot_wider(., names_from = "key", values_from = "value")
  5. ├─dplyr::filter(., !key %in% names(gff))
  6. └─tidyr::separate_wider_delim(., attributes, names = c("key", "value"), delim = "=")
  7. └─tidyr:::map_unpack(...)
  8. └─tidyr (local) fun(data[[col]], col)
  9. └─tidyr:::str_separate_wider_delim(...)
    1. └─tidyr:::check_df_alignment(...)
    2. └─cli::cli_abort(...)
    3. └─rlang::abort(...) Execution halted

      [15:52:34] ERROR >> errexit on line 425

I think it's the GFF ? Thoughts?

Lines of the GFF gff-version 3 Sequence Data: seqnum=1;seqlen=499946;seqhdr="phageY" Model Data: version=pyrodigal.v3.5.2;run_type=Metagenomic;model="59|Gut_phage_code_11c|V|29.9|11|1";gc_cont=29.95;transl_table=11;uses_sd=1 phageY pyrodigal_v3.5.2 CDS 407 610 21.2 + 0 ID=1;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length= phageY pyrodigal_v3.5.2 CDS 987 1292 36.5 + 0 ID=2;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length= phageY pyrodigal_v3.5.2 CDS 1555 1833 36.1 + 0 ID=3;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length= phageY pyrodigal_v3.5.2 CDS 1939 2334 50.7 + 0 ID=4;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length= phageY pyrodigal_v3.5.2 CDS 2443 2628 19.5 + 0 ID=5;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length= phageY pyrodigal_v3.5.2 CDS 2640 2921 26.0 + 0 ID=6;Name=hypothetical;Alias=phrog_15464;Dbxref=PHROG;evalue=3.8e-39;product_start=4;produ ct_end=93;product_length=89

many thanks, Rick

leightonpayne commented 3 days ago

Hi Rick,

If this is exactly how your GFF file appears:

gff-version 3
Sequence Data: seqnum=1;seqlen=499946;seqhdr="ATCC_phageG"
Model Data: version=pyrodigal.v3.5.2;run_type=Metagenomic;model="59|Gut_phage_code_11c|V|29.9|11|1";gc_cont=29.95;transl_table=11;uses_sd=1
phageY pyrodigal_v3.5.2 CDS 407 610 21.2 + 0 ID=1;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 987 1292 36.5 + 0 ID=2;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 1555 1833 36.1 + 0 ID=3;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 1939 2334 50.7 + 0 ID=4;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 2443 2628 19.5 + 0 ID=5;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 2640 2921 26.0 + 0 ID=6;Name=hypothetical;Alias=phrog_15464;Dbxref=PHROG;evalue=3.8e-39;product_start=4;produ
ct_end=93;product_length=89

Then I suspect that the parsing issue is with the header rows not being 'commented-out', i.e. they should start with a #:

#gff-version 3
#Sequence Data: seqnum=1;seqlen=499946;seqhdr="ATCC_phageG"
#Model Data: #version=pyrodigal.v3.5.2;run_type=Metagenomic;model="59|Gut_phage_code_11c|V|29.9|11|1";gc_cont=29.95;transl_table=11;uses_sd=1
phageY pyrodigal_v3.5.2 CDS 407 610 21.2 + 0 ID=1;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 987 1292 36.5 + 0 ID=2;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 1555 1833 36.1 + 0 ID=3;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 1939 2334 50.7 + 0 ID=4;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 2443 2628 19.5 + 0 ID=5;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY pyrodigal_v3.5.2 CDS 2640 2921 26.0 + 0 ID=6;Name=hypothetical;Alias=phrog_15464;Dbxref=PHROG;evalue=3.8e-39;product_start=4;produ
ct_end=93;product_length=89

Let me know if this fixes your issue!

raw937 commented 2 days ago

Nope. The headers look fine

##gff-version  3
# Sequence Data: seqnum=1;seqlen=499946;seqhdr="phageY"
# Model Data: version=pyrodigal.v3.5.2;run_type=Metagenomic;model="59|Gut_phage_code_11c|V|29.9|11|1";gc_cont=29.95;transl_table=11;uses_sd=1
phageY  pyrodigal_v3.5.2    CDS 407 610 21.2    +   0   ID=phageY_1;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY  pyrodigal_v3.5.2    CDS 987 1292    36.5    +   0   ID=phageY_2;Name=Hypothetical;Alias=;Dbxref=;evalue=;product_start=;product_end=;product_length=
phageY  pyrodigal_v3.5.2    CDS 1555    1833    36.1    +   0

Thoughts?

raw937 commented 9 hours ago

; error in starting gff. Look at [padlocbio/padloc] error in 'mutate()' (Issue #42).