iferres / pagoo

A comprehensive and intuitive encapsulated OO class system for analyzing bacterial pangenomes in R.
https://iferres.github.io/pagoo/
28 stars 4 forks source link

roary_2_pagoo Error: subscript contains invalid names #59

Closed jhcuarta closed 1 year ago

jhcuarta commented 1 year ago

Hi I encountered the following error while trying to create R6 class object; This is the script I'm using and the corresponding error:

library(pagoo)

gffs <- list.files(pattern = "[.]gff$", recursive = TRUE, full.names = TRUE)

gpa_csv <- "/home/jason/Documents/pagoo/gene_presence_absence.csv"

p <- roary_2_pagoo(gene_presence_absence_csv = gpa_csv, gffs = gffs, sep = "__", paralog_sep = "\t")

Reading csv file (roary). Processing csv file. Reading gff file ./10432_62_LANL.gff Reading gff file ./107V1216_BRAC.gff Reading gff file ./1154_74_LANL.gff Reading gff file ./11S_UM.gff Reading gff file ./1346_SC.gff Reading gff file ./1362_SC.gff . . . Reading gff file ./YB8E08_UA.gff Reading gff file ./YN2011004_YPCDCP.gff Reading gff file ./YN89004_YPCDCP.gff Reading gff file ./YN97083_YPCDCP.gff Error: subscript contains invalid names

iferres commented 1 year ago

Hi @jhcuarta , I'm out of office without my laptop the next two weeks. I'll put a reminder to address this as soon as I can. Sorry. Bests!

malihaaziz commented 1 year ago

Its crashing on these lines in roary_2_pagoo script '## Selected columns cols <- c('seqid', 'type', 'start', 'end', 'strand', 'product', 'org', 'locus_tag') mcls <- lapply(mcls, function(x) x[ , cols])

You are not reading in any 'product' in your read_gff function

iferres commented 1 year ago

Thanks @malihaaziz !! And very sorry @jhcuarta, I completely forgot about this issue after my vacations.

I'm not able to reproduce the error. Could you provide a reproducible example? As small as possible please 😬 . I would need the gffs and the gene_presence_absence.csv file. Could use wetransfer or send to my email. iferres at pasteur dot edu dot uy

jhcuarta commented 1 year ago

Hi @iferres I edited all the genome sequences and right now I'm re annotating, hope to run roary soon and send you the files next week. Beg your pardon

Best regards

iferres commented 1 year ago

Don't worry, I'm the delayed person here. I ran roary_2_pagoo with version 0.3.17 on a test dataset and everythings looks ok:

> suppressPackageStartupMessages(library(pagoo))
> gffs <- list.files("gffs/", full.names=T)
> csv <- "roary_out/gene_presence_absence.csv"
> p <- roary_2_pagoo(csv, gffs)
Reading csv file (roary).
Processing csv file.
Reading gff file gffs//Hinfluenzae_2019.gff
Reading gff file gffs//Hinfluenzae_86-028NP.gff
Reading gff file gffs//Hinfluenzae_CGSHiCZ412602.gff
Reading gff file gffs//Hinfluenzae_KR494.gff
Reading gff file gffs//Hinfluenzae_PittEE.gff
Reading gff file gffs//Hinfluenzae_R2846.gff
Reading gff file gffs//Hinfluenzae_R2866.gff
Reading gff file gffs//Hinfluenzae_Rd_KW20.gff
Reading gff file gffs//Hinfluenzae_strain_FDAARGOS_199.gff
Reading gff file gffs//Hinfluenzae_strain_NCTC11931.gff
Loading PgR6MS class object.
Checking class.
Checking dimnames.
Creating gid (gene ids).
Checking provided cluster metadata.
Creating panmatrix.
Populating class.
Checking input sequences.
Checking that sequence names matches with DataFrame.
Adding metadata to sequences.
Done.
malihaaziz commented 1 year ago

im running into the same error with both roary_2_pagoo and panaroo_2_pagoo hence the reason why I debugged your code. My genomes are prokka annotated. im a bit surprised that you are not running into this error

malihaaziz commented 1 year ago

I have created a test that is crashing. Can you please test it with your version of the code. this is a panaroo generated csv. gff-test.zip gene_presence_absence-test.csv

iferres commented 1 year ago

@malihaaziz Yep, at least the panaroo csv file you send me looks different from the one I used to test the function. There are two bugs, one pops up when passing only the csv, and the other one with the csv and gffs files. I want to take some time to see if there are many versions of the csv and what is happening with the gffs. I don't want to fix it for this case and break it for the others. By the way, which version of panaroo are you using?

malihaaziz commented 1 year ago

(base) [mlaziz@log002 ~]$ panaroo --version panaroo 1.2.9

malihaaziz commented 1 year ago

I dowloaded the latest version of panaroo (Version- 1.3.2) and re-ran it on my gffs. I still get the same error when i run pagoo..

iferres commented 1 year ago

Thank you, I will look into this next week.

iferres commented 1 year ago

Hi, which version of prokka are you using? I find some discrepancies between the ones you send me and the once I got with version 1.14.6-0 (conda). For instance, I see duplicated gene names:

==> 05_A8.gff <==
##gff-version 3
##sequence-region gnl|LIUPRICE|05_A8_5_1 1 1927179
gnl|LIUPRICE|05_A8_5_1  prokka  gene    1   1338    .   +   .   ID=05_A8_5_00001_gene;Name=dnaA;gene=dnaA;locus_tag=05_A8_5_00001
gnl|LIUPRICE|05_A8_5_1  Prodigal:002006 CDS 1   1338    .   +   0   ID=05_A8_5_00001;Parent=05_A8_5_00001_gene;Name=dnaA;db_xref=COG:COG0593;gene=dnaA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P05648;locus_tag=05_A8_5_00001;product=Chromosomal replication initiator protein DnaA;protein_id=gnl|LIUPRICE|05_A8_5_00001
gnl|LIUPRICE|05_A8_5_1  prokka  gene    1569    2705    .   +   .   ID=05_A8_5_00002_gene;Name=dnaN;gene=dnaN;locus_tag=05_A8_5_00002
gnl|LIUPRICE|05_A8_5_1  Prodigal:002006 CDS 1569    2705    .   +   0   ID=05_A8_5_00002;Parent=05_A8_5_00002_gene;Name=dnaN;db_xref=COG:COG0592;gene=dnaN;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P05649;locus_tag=05_A8_5_00002;product=Beta sliding clamp;protein_id=gnl|LIUPRICE|05_A8_5_00002
gnl|LIUPRICE|05_A8_5_1  prokka  gene    2921    3157    .   +   .   ID=05_A8_5_00003_gene;locus_tag=05_A8_5_00003
gnl|LIUPRICE|05_A8_5_1  Prodigal:002006 CDS 2921    3157    .   +   0   ID=05_A8_5_00003;Parent=05_A8_5_00003_gene;inference=ab initio prediction:Prodigal:002006;locus_tag=05_A8_5_00003;product=hypothetical protein;protein_id=gnl|LIUPRICE|05_A8_5_00003
gnl|LIUPRICE|05_A8_5_1  prokka  gene    3160    4305    .   +   .   ID=05_A8_5_00004_gene;Name=recF_1;gene=recF_1;locus_tag=05_A8_5_00004
gnl|LIUPRICE|05_A8_5_1  Prodigal:002006 CDS 3160    4305    .   +   0   ID=05_A8_5_00004;Parent=05_A8_5_00004_gene;Name=recF_1;db_xref=COG:COG1195;gene=recF_1;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:Q8RDL3;locus_tag=05_A8_5_00004;product=DNA replication and repair protein RecF;protein_id=gnl|LIUPRICE|05_A8_5_00004

Probably is a flag you pass to prokka to sub divide annotations as "prodigal" and "prokka" (see second column on each entry).

iferres commented 1 year ago

Yes in did, it's the --compliant flag in prokka. I'm not sure the treatment panaroo and roary do over this gff variant, I would suggest to re run prokka without the --compliant flag.

Also, there is a bug which rises an error when reading the csv, without the gffs. I'm pushing some changes to address that. I will let you know.

malihaaziz commented 1 year ago

thankyou for troubleshooting this. Ill switch over everything to Bakta..

iferres commented 1 year ago

Now it should work with your gene_presence_absence.csv file:

#Reinstall pagoo from source
devtools::install_github("iferres/pagoo") # installs 0.3.18

library(pagoo)
p <- panaroo_2_pagoo("gene_presence_absence.csv")

Prokka is ok, the thing which is causing the issue is the --compliant flag.

jhcuarta commented 1 year ago

Hi iferres The same error continues to occur

setwd("~/Documents/pagoo") suppressPackageStartupMessages(library(pagoo)) gffs <- list.files("/home/jason/Documents/pagoo/gffs", full.names=T) csv <- "gene_presence_absence.csv" p <- roary_2_pagoo(csv, gffs) Reading csv file (roary). Processing csv file. Reading gff file /home/jason/Documents/pagoo/gffs/10432_62_LANL.gff Reading gff file /home/jason/Documents/pagoo/gffs/107V1216_BRAC.gff Reading gff file /home/jason/Documents/pagoo/gffs/1154_74_LANL.gff Reading gff file /home/jason/Documents/pagoo/gffs/11S_UM.gff Reading gff file /home/jason/Documents/pagoo/gffs/1346_SC.gff Reading gff file /home/jason/Documents/pagoo/gffs/1362_SC.gff Reading gff file /home/jason/Documents/pagoo/gffs/146N_ILS.gff Reading gff file /home/jason/Documents/pagoo/gffs/146P_ILS.gff . . . Reading gff file /home/jason/Documents/pagoo/gffs/YB7A06_UA.gff Reading gff file /home/jason/Documents/pagoo/gffs/YB7A09_UA.gff Reading gff file /home/jason/Documents/pagoo/gffs/YB8E08_UA.gff Reading gff file /home/jason/Documents/pagoo/gffs/YN2011004_YPCDCP.gff Reading gff file /home/jason/Documents/pagoo/gffs/YN89004_YPCDCP.gff Reading gff file /home/jason/Documents/pagoo/gffs/YN97083_YPCDCP.gff Error: subscript contains invalid names

I annotated all genomes usisng prokka 1.14.6, usisng the following commandline prokka --setupdb && prokka --genus Vibrio --species cholerae --usegenus --prefix 1Mo_UM --cpus 8 --outdir 1Mo_UM --rfam --addgenes --addmrna --cdsrnaolap 1Mo_UM.fna

On the other hand I used roary 3.13.0

Here are the links to the respective files https://drive.google.com/file/d/1xTmiJMs12Du8e069oUoH0lJiCJE8EpPW/view?usp=sharing https://drive.google.com/file/d/13d4fhOEKoaOCZa7V_8HrC5JRJ0uuQba4/view?usp=sharing

iferres commented 1 year ago

Thank you @jhcuarta , I will look at it today.

iferres commented 1 year ago

Do you think you could make a smaller reproducible example? My little laptop is screaming with 500Mb of compressed gff files 😅 . Make sure you capture the same error with the smaller dataset.

iferres commented 1 year ago

Ah, there's a similar issue with the gffs, every gene has 2 entries: one with tag "gene", and the other "mRNA". Let me see if I can make pagoo handle these cases, otherwise I will continue receiving issues like this one. Thanks both of you for reporting!

iferres commented 1 year ago

Now it should work @jhcuarta

#Reinstall pagoo from source
devtools::install_github("iferres/pagoo") 

library(pagoo)
p <- panaroo_2_pagoo("gene_presence_absence.csv", list.files(path = "...", pattern="[.]gff$", full.names=T))
jhcuarta commented 1 year ago

Hi iferres Mine is roary_2_pagoo

iferres commented 1 year ago

Yep! sorry, is the same since it was a bug in an internal function which is called by both panaroo_2_pagoo and roary_2_pagoo. Try roary_2_pagoo and let me know.

jhcuarta commented 1 year ago

Hi iferres It run just fine but threw some warnings, is there a problem with those

Loading PgR6MS class object. Checking class. Checking dimnames. Creating gid (gene ids). Checking provided cluster metadata. Creating panmatrix. Populating class. Checking input sequences. Checking that sequence names matches with DataFrame. Adding metadata to sequences. Done. There were 50 or more warnings (use warnings() to see the first 50)

warnings() Warning messages: 1: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 2: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 3: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 4: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 5: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped . . . 45: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 46: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 47: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 48: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 49: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 50: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped

iferres commented 1 year ago

Strange. I tried with your full dataset and had no warnings. Can you paste here the output of sessionInfo()?

jhcuarta commented 1 year ago

Hi this is the output

Loading PgR6MS class object. Checking class. Checking dimnames. Creating gid (gene ids). Checking provided cluster metadata. Creating panmatrix. Populating class. Checking input sequences. Checking that sequence names matches with DataFrame. Adding metadata to sequences. Done. There were 50 or more warnings (use warnings() to see the first 50)

sessionInfo() R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.1 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=es_CO.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=es_CO.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=es_CO.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] pagoo_0.3.18 ggplot2_3.4.1 Biostrings_2.66.0 GenomeInfoDb_1.34.9 [5] XVector_0.38.0 IRanges_2.32.0 S4Vectors_0.36.1 BiocGenerics_0.44.0

loaded via a namespace (and not attached): [1] viridis_0.6.2 httr_1.4.4 sass_0.4.5
[4] tidyr_1.3.0 splines_4.2.2 jsonlite_1.8.4
[7] viridisLite_0.4.1 foreach_1.5.2 bslib_0.4.2
[10] shiny_1.7.4 assertthat_0.2.1 GenomeInfoDbData_1.2.9 [13] lattice_0.20-45 pillar_1.8.1 glue_1.6.2
[16] digest_0.6.31 GenomicRanges_1.50.2 RColorBrewer_1.1-3
[19] promises_1.2.0.1 colorspace_2.1-0 Matrix_1.5-3
[22] htmltools_0.5.4 httpuv_1.6.8 plyr_1.8.8
[25] pkgconfig_2.0.3 zlibbioc_1.44.0 purrr_1.0.1
[28] xtable_1.8-4 scales_1.2.1 webshot_0.5.4
[31] later_1.3.0 tibble_3.1.8 mgcv_1.8-41
[34] generics_0.1.3 ellipsis_0.3.2 DT_0.27
[37] cachem_1.0.6 withr_2.5.0 lazyeval_0.2.2
[40] cli_3.6.0 magrittr_2.0.3 crayon_1.5.2
[43] mime_0.12 heatmaply_1.4.2 fansi_1.0.4
[46] nlme_3.1-162 MASS_7.3-58.2 vegan_2.6-4
[49] shinydashboard_0.7.2 tools_4.2.2 registry_0.5-1
[52] data.table_1.14.6 lifecycle_1.0.3 stringr_1.5.0
[55] plotly_4.10.1 munsell_0.5.0 cluster_2.1.4
[58] compiler_4.2.2 jquerylib_0.1.4 ca_0.71.1
[61] rlang_1.0.6 grid_4.2.2 RCurl_1.98-1.10
[64] iterators_1.0.14 rstudioapi_0.14 htmlwidgets_1.6.1
[67] bitops_1.0-7 shinyWidgets_0.7.6 gtable_0.3.1
[70] codetools_0.2-19 DBI_1.1.3 TSP_1.2-2
[73] reshape2_1.4.4 R6_2.5.1 seriation_1.4.1
[76] gridExtra_2.3 dplyr_1.1.0 fastmap_1.1.0
[79] utf8_1.2.3 ggfortify_0.4.15 permute_0.9-7
[82] dendextend_1.16.0 stringi_1.7.12 parallel_4.2.2
[85] Rcpp_1.0.10 vctrs_0.5.2 tidyselect_1.2.0

iferres commented 1 year ago

You shouldn't have any problems. I will try to update my setup these days to try to debug this warning, I'm working with slightly older versions of R and packages now. But pagoo should work if could load the object since many checks are done when a pangenome object is initialized.

malihaaziz commented 1 year ago

Hi , I am now using bakta and ran panaroo to get the pangenome Im now getting this error

setwd("/lustre/groups/lab/mlaziz/Dp/panaroo-020923-bakta/020923_Dp-bakta-panarooV1.3.2") gffs <- list.files(path = "/lustre/groups/lab/mlaziz/Dp/panaroo-020923-bakta/gff", pattern = "[.]gff3$", full.names = TRUE) gpa_csv <- "gene_presence_absence.csv" p <- panaroo_2_pagoo(gene_presence_absence_csv = gpa_csv, gffs = gffs) Reading csv file (panaroo). Processing csv file. Removing 314 genes tagged as 'refound', 'stop', and/or 'length' by panaroo. Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/01_A1.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/02_B4.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/02_F4.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/03_C2.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/05_A8.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/33_A7.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/44MNt_B4_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/63VAs-B3_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/63VAs-Sm1_contigs.gff3 Error in .Call2("C_solve_user_SEW", refwidths, start, end, width, translate.negative.coord, : solving row 847: 'allow.nonnarrowing' is FALSE and the supplied end (145482) is > refwidth

ive attached a shorter example . can you please test it with your version of code. test.zip

iferres commented 1 year ago

I think I found it. The thing is that bakta is probably identifying features which starts near the end of the contig and finish after the beginning, but reports the end of the feature as an integer larger than the length of the contig. The parser I'm using get confused by this. I have some meetings right now, I will do my best to fix it in my afternoon today.

iferres commented 1 year ago

Fixed (I hope 😅 ). I check both functions panaroo_2_pagoo and roary_2_pagoo with both of your datasets and pagoo load them fine.

jhcuarta commented 1 year ago

Hi Same error Loading PgR6MS class object. Checking class. Checking dimnames. Creating gid (gene ids). Checking provided cluster metadata. Creating panmatrix. Populating class. Checking input sequences. Checking that sequence names matches with DataFrame. Adding metadata to sequences. Done. There were 50 or more warnings (use warnings() to see the first 50)

warnings() Warning messages: 1: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 2: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 3: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 4: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped . . . 47: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 48: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 49: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped 50: In FUN(X[[i]], ...) : metadata columns on input DNAStringSet object were dropped

iferres commented 1 year ago

Yes, sorry, I was referring to the bakta error. Just making sure that I didn't broke anything fixing that. I haven't had the time to look at those warnings in detail, but a quick search tells me that it's just Biostrings downgrading a S4 class to a BString and dropping unnecessary internal object metadata:

https://github.com/Bioconductor/Biostrings/blob/c94e8fb082601cf3e3998df82cf1a9b39c72cb27/R/XStringSet-class.R#L331-L333

Just ignore them.

malihaaziz commented 1 year ago

Thankyou! I repulled/installed pagoo via devtools. there is progress but now i see this error

p <- panaroo_2_pagoo(gene_presence_absence_csv = gpa_csv, gffs = gffs) Reading csv file (panaroo). Processing csv file. Removing 314 genes tagged as 'refound', 'stop', and/or 'length' by panaroo. Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/01_A1.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/02_B4.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/02_F4.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/03_C2.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/05_A8.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/33_A7.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/44MNt_B4_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/63VAs-B3_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/63VAs-Sm1_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/68VAs-B3_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/68VPs-B6_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/81UNt-Sm4_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/83VAs-Sm8_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/83VPs-KB5_GCF_007197715.1_ASM719771v1_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/87UNt-Sm4_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/88MNs-Sm2_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/88VPs-Sm9_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/90VAs-B6_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/90VAs-Sm9_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/9VPs-B5_contigs.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/ATCC-51524_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/Dp_81Mnt_Sm4.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL1914_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL1922_CDC39-95_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL1931_CDC4294-98_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL1933_CDC4545-98_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL1934_CDC4709-98_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL1937_CDC4199-99_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL1939_CDC4792-99_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3033_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3043_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3050_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3052_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3065_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3069_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3070_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3077_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3084_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3086_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3090_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3246_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3250_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3256_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3264_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3274_genomic.gff3 Reading gff file /lustre/groups/liu_price_lab/mlaziz/Dp/panaroo-020923-bakta/gff/KPL3911_genomic.gff3 Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : row names contain missing values

iferres commented 1 year ago

Did it manage to read all the gffs, or failed when reading the last one that appears in the log (KPL3911)?

iferres commented 1 year ago

Other question: which panaroo version are you using? The error looks similar to #57 .

malihaaziz commented 1 year ago

panaroo (Version- 1.3.2) bakta (Version-1.6.1) i have 46 gffs in the analysis. it looks like pagoo breaks at 33rd. i can try the internal pagoo:::read_gff(gff_file) command to see which one does it hate