Closed marzie-rasekh closed 11 months ago
Hi and thanks for your interest in BEERS2 and CAMPAREE,
Unfortunately, you can't currently run CAMPAREE with a single sample. As you've found, this is because we include a genetic phasing step that requires as least two samples. I'm in the process of patching CAMPAREE so it can skip the Beagle step. Thanks for your patience.
I've patched CAMPAREE to skip the phasing step if the user provides only one sample. Would you be willing to download the version of CAMPAREE in the 'develop' branch (commit: 3fd75eaf93e6b736c551937c2535a71606b01e24) and confirm that it works with your data?
If it works for you, I'll release the patch to the main branch. Thanks!
I ran it. This time it failed at the MoleculeMakerStep with error:
MoleculeMakerStep.serial.err
:
Traceback (most recent call last):
File "/home/mrasekh/git/CAMPAREE/camparee/molecule_maker.py", line 726, in <module>
sys.exit(MoleculeMakerStep.main())
File "/home/mrasekh/git/CAMPAREE/camparee/molecule_maker.py", line 718, in main
molecule_maker.execute(sample=sample,
File "/home/mrasekh/git/CAMPAREE/camparee/molecule_maker.py", line 443, in execute
[read_fasta(os.path.join(sample_data_directory,
File "/home/mrasekh/git/CAMPAREE/camparee/molecule_maker.py", line 443, in <listcomp>
[read_fasta(os.path.join(sample_data_directory,
File "/home/mrasekh/git/BEERS_UTILS/beers_utils/read_fasta.py", line 35, in read_fasta
raise ValueError(f"Invalid characters found in the fasta file {fasta_file}: all must be in ACGTN")
ValueError: Invalid characters found in the fasta file /run_1/CAMPAREE/data/sample1/custom_genome_1.fa: all must be in ACGTN
Would this be because of some R and Y characters in the reference genome?
I fixed the reference and reran the pipeline on two samples separately. It took a very long time with 36 threads (where ever possible), however, the pipeline was executed successfully. Thank you.
Thank you very much for testing the patch, and for your feedback! CAMPAREE is a fairly involved compute, so the runtime isn't too surprising. It's effectively running a full alignment, gene/intron/transcript quantification, and variant calling pipeline on each sample. If you weren't already, running it in a cluster environment tends to speed things up more than adding threads. That's an area for optimization we should explore further.
I'm closing this issue as resolved, but I'm marking down support for non-standard bases as a potential feature to add in future releases.
When running CAMPAREE, I get an error from BEAGLE.
It looks like beagle is complaining that "ERROR: there is only one sample". I got this by using a newer version of beagle. How can I run CAMPAREE on single sample RNA-seq data (only one pair of fastq files)?
Here are the err messages:
BeagleStep.log
(exit code 1):and
BeagleStep.serial.err
: