itmat / CAMPAREE

Configurable And Modular Program Allowing RNA Expression Emulation
GNU General Public License v3.0
5 stars 1 forks source link

"pooled" parameter does not work? #3

Open Argonvi opened 1 year ago

Argonvi commented 1 year ago

Hello ITMAT Bioinformatics Laboratory team,

I have been trying to use CAMPAREE to generate transcripts from an RNA sample. I would like the resulting transcripts to be equal to the reference genome, without any variants. This is so I can introduce specific variants later on and perform benchmarking on variant-calling tools.

In the template.config.yaml file, I found the following description of the "pooled" parameter: Setting this to 'True' will skip all variant-calling, phasing, parental genome construction, and allele-specific quantification operations in the CAMPAREE pipeline. Simulated transcripts will effectively be generated from the haploid reference genome.

However, even when setting "pooled" to True, the variant calling and phasing are being executed. In my case, during the beagle step, an error occurs which stops the execution. The log is not very descriptive of the problem:

beagle.28Sep18.793.jar (version 5.0) Copyright (C) 2014-2018 Brian L. Browning Enter "java -jar beagle.28Sep18.793.jar" to list command line argument Start time: 01:18 PM CET on 08 Feb 2023

Command line: java -Xmx16080m -jar beagle.28Sep18.793.jar gt=/media/scratch/rnaseq_read_simulation/analysis/camparee/run_prueba/CAMPAREE/data/all_variants.vcf out=/media/scratch/rnaseq_read_simulation/analysis/camparee/run_prueba/CAMPAREE/data/beagle seed=1273642419 nthreads=1

No genetic map is specified: using 1 cM = 1 Mb

Reference samples: 0 Study samples: 1

Window 1 (chr1:16242-39963300) Study markers: 26,508

brainfood commented 1 year ago

Unfortunately, this feature fell through the cracks. Though we list this option in the config file, we didn't actually implement it in the code. I'm hoping to be able to add, test, and release this functionality over the next 1-2 weeks, workload permitting.

Thanks for catching this and bringing it to our attention.

Argonvi commented 1 year ago

Thank you for your answer. I have been trying to make the program work but I still get the error shown in my first post. Beagle gets stuck in the first "windows" and stops the execution. The test run with the baby genome completes fine. Should I post this as a separate issue?

brainfood commented 1 year ago

Before opening a new issue, I have a couple of questions:

  1. Are you using at least 2 input samples to prime CAMPAREE? Because we haven't implemented the pooled option, CAMPAREE will attempt to identify variants and then use beagle to phase them. This error could arise from trying to perform genotype phasing with a single sample.
  2. Can you post the contents of the "BeagleStep..out" and "BeagleStep..err" log files? They should capture output that the Beagle log file does not.

Based on your answers, we should be able to determine if this is a separate issue. Thanks.

Argonvi commented 1 year ago

I had been running CAMPAREE with only one sample. After reading your comment, I introduced another sample to see if the error would be solved. However, now I am getting a different error, also at the Beagle step. BeagleStep.log BeagleStep.serial.err.txt BeagleStep.serial.out.txt

brainfood commented 1 year ago

This error does appear to be separate from the original issue. I created a new issue to track this error. Please see me comment in the new issue. Thanks!