yanhui09 / laca

A reproducible and scalable workflow for Long Amplicon Consensus Analysis (LACA)
GNU General Public License v3.0
7 stars 2 forks source link

Organising folders to analyse multiple samples (and amplicons) at once #3

Open dgslos opened 1 year ago

dgslos commented 1 year ago

Hi,

I would like to analyse multiple samples at once after demultiplexing. As input data I have one fasta file for each sample and each amplicon.

I'm not sure how to organise my data and I could not find it in the documentation. So there are several options to organise the data:

demultiplexed_data/
├─ sample1/
│  ├─ amplicon1/
│  ├─ amplicon2/
├─ sample2/
│  ├─ amplicon1/
│  ├─ amplicon2/

Or

demultiplexed_data1/
├─ sample1/
│  ├─ amplicon1/
├─ sample2/
│  ├─ amplicon1/
demultiplexed_data2/
├─ sample1/
│  ├─ amplicon2/
├─ sample2/
│  ├─ amplicon2/

The wanted output file after clustering would be like this: clustered_consensus_all_samples_amplicon1.fasta clustered_consensus_all_samples_amplicon2.fasta

With corresponding count table for each amplicon like this:

| OTU  | sample1 | sample2 |
|------|---------|---------|
| OTU1 | 0       | 250     |
| OTU2 | 142     | 0       |
| OTU3 | 143     | 1653    |
| ...  | ...     | ...     |

Is laca currently able to produce output like this? And should I analyse one amplicon at a time or are multiple amplicons possible?

Thanks!

yanhui09 commented 1 year ago

Hi Thank you for your interest in laca. You need to analyze one amplicon each time. Just make sure the demultiplexed directories are named in a format of [a-zA-Z]+[0-9]+. And the unzipped fastq files are put in each directory respectively.

demultiplexed_data1/
├─ sample1/
 │  ├─ batch1.fastq
 │  ├─ batch2.fastq
├─ sample2/
 │  ├─ batch1.fastq
 │  ├─ batch2.fastq

Best Yan