mritchielab / FLAMES

A framework for performing single-cell and bulk read full-length analysis of mutations and splicing.
https://mritchielab.github.io/FLAMES/
GNU General Public License v3.0
17 stars 9 forks source link

passing bam files to `sc_long_multisample_pipeline` #48

Open sparthib opened 1 week ago

sparthib commented 1 week ago

Hi there,

I have generated BAM files for all my samples manually using minimap2, what would be the format to pass these ontosc_long_multisample_pipeline? I see options listed for passing the FASTQ files to the function in the documentation, is it similar for the bams?

Thanks, Sowmya

ChangqingW commented 1 week ago

You can copy / symlink the BAM files to the output folder, name them as [corresponding_fastq_file_name]_align2genome.bam, e.g. if you have sample1.fastq and sample2.fastq, then put sample1_align2genome.bam and sample2_align2genome.bam should make FLAMES skip the alignment step and use the provided BAM. This will also work for realignment (sample1_realign2transcript.bam).

sparthib commented 5 days ago

great thanks!

sparthib commented 1 day ago

is there a similar multi-sample pipeline for bulk samples? Thanks!

Sowmya

ChangqingW commented 1 day ago

For now, you could put all FASTQs into one folder and provide the path to the folder, each FASTQ file would be considered a sample and the [corresponding_fastq_file_name]_align2genome.bam file could skip the alignment step. The plan is to make this the same as the sc_long_multisample_pipeline in the devel branch where you could simply provide a named vector, where values could be path to folder or file:

sc_long_multisample_pipeline(
fastqs = c(
  "sample1" = file.path(outdir, "fastq", "your_fq_folder_for_1st_sample"),
  "sample2" = file.path(outdir, "fastq", "your_second_fq.fq.gz"),
  "sample3" = file.path(outdir, "fastq", "third.fq.gz")),
...
)

And then the names would be used as sample names