sjlow23 / pathogd

RPA primer and CRISPR-Cas12a guide RNA design for diagnostics
GNU General Public License v3.0
3 stars 0 forks source link

Empty primer table #1

Open esteinig opened 11 months ago

esteinig commented 11 months ago

@sjlow23 a join error occurs at some stage in pathogd pangenome:

Warning message:
In fread(args[1], header = T, sep = "\t") :
  File 'primer_check/target/all_ontarget_parsed.tsv' has size 0. Returning a NULL data.table.
No primer hits to non-target genomes
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `primer_set`.
Backtrace:
     ▆
  1. ├─... %>% select(-sum_primer_lengths)
  2. ├─dplyr::select(., -sum_primer_lengths)
  3. ├─dplyr::mutate(...)
  4. ├─dplyr::left_join(., primer_lengths, by = c(primer_set = "primerset_uniq"))
  5. └─dplyr:::left_join.data.frame(., primer_lengths, by = c(primer_set = "primerset_uniq"))
  6.   └─dplyr:::join_mutate(...)
  7.     └─dplyr:::join_cols(...)
  8.       └─dplyr:::standardise_join_by(...)
  9.         └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
 10.           └─rlang::abort(bullets, call = error_call)
Execution halted

I would need your help to account for that, as it seems to be related to one of the R scripts. See if you can reproduce the pangenome test on the dev branch and on main (but do not think it's a regression from my small edits so far):

git pull
git checkout dev

pathogd -c config.txt -w check
pathogd -c config.txt -w download_target -o pathogd_output
pathogd -c config.txt -w download_nontarget -o pathogd_output
pathogd -c config.txt -m pangenome -w ncbi_all_nosubsample -o pathogd_output
esteinig commented 11 months ago

I think this may be related to a preceding error in the JVM memory provision of BBMap - my machine has 48GB (should be fairly sufficient) but a number of errors like this occur:

java -ea -Xmx-60m -Xms-60m -cp /data/opt/conda/envs/pathogd/opt/bbmap-38.96-1/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in=/data/dev/pathogd/testing/pathogd_output/primers/guides_input_pam.fasta ref=/data/dev/pathogd/testing/pathogd_output/genomes_offtarget/GCA_009941325.1_genomic.fna nodisk noheader=t ambig=all vslow idfilter=.808 nmtag=t xmtag=f nhtag=f amtag=f idtag=t indelfilter=0 maxsites=10000000 threads=4 outm=GCA_009941325.1_wg.sam
Invalid maximum heap size: -Xmx-60m
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
environment: GCA_009941325.1_wg.sam: No such file or directory
rm: cannot remove 'GCA_009941325.1_wg.sam': No such file or directory
java -ea -Xmx-76m -Xms-76m -cp /data/opt/conda/envs/pathogd/opt/bbmap-38.96-1/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in=/data/dev/pathogd/testing/pathogd_output/primers/guides_input_pam.fasta ref=/data/dev/pathogd/testing/pathogd_output/genomes_offtarget/GCA_002128165.1_genomic.fna nodisk noheader=t ambig=all vslow idfilter=.808 nmtag=t xmtag=f nhtag=f amtag=f idtag=t indelfilter=0 maxsites=10000000 threads=4 outm=GCA_002128165.1_wg.sam
esteinig commented 11 months ago

Perhaps this is a parallel execution thing gobbling up the memory? Have you seen this before on your machine? I'll try with less threads tomorrow.

esteinig commented 11 months ago

Reducing number of threads fixes it (-t 8) - the bug seems to be memory limitation on too many parallel executions of BBmap. However for the test config.txt in the repo, it appears the that pangenome mode does not find any suitable genes, can you replicate this on dev branch?

Error: No target-specific genes found!
esteinig commented 11 months ago

Kmer is similarly failing with errors of no k-mers found - is the M. genitalium not suitable for testing? I can wait until you reproduce and see what you think :)

sjlow23 commented 11 months ago

The "no target-specific genes" error is related to the config.txt file- the taxid field variable was updated in the code but not in config.txt. Can you test with the updated config.txt, and then rerun these commands?

pathogd -c config.txt -w download_target -o pathogd_output
pathogd -c config.txt -w download_nontarget -o pathogd_output
pathogd -c config.txt -m pangenome -w user_all_nosubsample -o pathogd_output

I used the user_all_nosubsample workflow here as the genomes have already been downloaded.

esteinig commented 11 months ago

Thank you! I see that it was related to my home setup - will rerun!