data import takes a long time

metabaRfactory / metabaR

metabaR is an R package to curate and visualise DNA metabarcoding data after basic bioinformatics analyses.

http://metabaRfactory.github.io/metabaR

13 stars 3 forks source link

data import takes a long time #23

Closed rturba closed 2 years ago

rturba commented 3 years ago

Hello,

This package looks amazing, and I am really excited because it seems to help with the issue of tag-jumping. The problem is that dealing with a full sequencing run with this package is unmanageable currently. The output files for full runs are really big and just importing the data with the tabfiles_to_metabarlist takes forever. I cannot see the point of exploring this tool without having to deal with large files, so I was wondering if there is a work around this issue.

I am trying my best to deal with this, but I cannot be the only one that plans to load a full run. Was this not supposed to deal with full PCR plates?

Thank you in advance!

rturba commented 3 years ago

So, I'm not very good at coding, but I've managed to speed up things a little bit when reading my motus and reads tables by adjusting the read.table options and replacing to read_csv:

motus <- read_csv("filename.csv", 
                  col_names = TRUE,
                  progress = show_progress())
colnames(motus)[colnames(motus) == "X1"] = "seq_number"
motus <- column_to_rownames(motus, "seq_number")

Because it's a tibble, I had to adjust it afterwards to have rownames.

lzinger commented 3 years ago

Hi! Sorry for seing this issue so late. Even if you have a full run, the size of your file should be strongly reduced after demultiplexing/dereplication and clustering in a more classical bioinformatic pipeline. Could you please indicate me the size of your csv?

rturba commented 3 years ago

Hi, sorry for such a delay! It's 500KB. I have also moved this analysis to a more powerful computer, and was able to process this faster.