Run isONCorrect on multiple PCs parallel

jkbenotmane commented 3 years ago

Would it be possible to run isONCorrect on multiple Computers connected via Lan ?

ksahlin commented 3 years ago

You can run isONcorrect with the parallelization-across-nodes setting in the README. Simply specify --split_mod n --residual x (x in [0,1, .. n-1]) to the run_isoncorrect script and it will let each computer take care of separate clusters.

If such a setup shares a common disc space, simply write the corrected reads to the same shared outfolder. If not, you would have to combine the corrected reads back to some shared space, e.g. dropbox.

ksahlin commented 3 years ago

updated previous comment.

jkbenotmane commented 3 years ago

I read about it, but could you give me some advice of how-to setup the nodes (links for Reading eg, example Snakefile, anything I do not have a bioinformatic background)? I have access to three powerful Desktops, but they are not connected to a Cluster.

The Idea with Dropbox is really nice, thanks!

ksahlin commented 3 years ago

Unfortunately, I don't know how to do any advance setup, what I suggest is a bit clunky. I would either

1. Run the [pipeline](https://github.com/ksahlin/isONcorrect#manually) up to the `run_isoncorrect` on all desktops.
2. run  the `run_isoncorrect` step with `--split_mod 3 --residual x` where a given residual  x=0,1,2 corresponds to a given desktop.
3. Combine the output on, e.g., dropbox

You can then delete the redundant copies of the clustering output. You have to make sure to use the same number of cores for isONclust in the above approach though as the parallel version with different cores does not guarantee the same output.

Alternatively, safer but a bit clunkier:

1. Run the [pipeline](https://github.com/ksahlin/isONcorrect#manually) up to the `run_isoncorrect` step on one desktop.
2. Copy the clustered fastq files to the other two computers
3. Run  the `run_isoncorrect` step with `--split_mod 3 --residual x` where a given residual  x=0,1,2 corresponds to a given desktop.
4. Combine the output on, e.g., dropbox

jkbenotmane commented 3 years ago

Thank you Kristoffer, that seems doable and compatible with the Rest of my Pipeline!

Will go for the safer second option -and try not to run in the core problem- as clustering is fast even for bigger runs with isONClust2.

jkbenotmane commented 3 years ago

Hello Kristoffer after running isONcorrect for 10 days on a desktop with 16 Cores 380gb Ram only about 40000 Reads were corrected. Running it in the Multinode Mode is currently running but just outputted 10000 Reads more. I am running run_isoncorrect with the optimized parameters --k 9 --w 20 --max_seqs 2000

Even after Read Extraction after a first alignment I have to be able to deal with 16M Reads at least. Is there anything else I could do to optimize Runtime?

eg trimming length could to short Reads cause problems?
splitting the file before clustering

Or did I maybe do a mistake in creating full length transcripts? I run pychopper with the following options: -m edlib -b 10XPrimers_pychopper.fa -p -c 10XPrimers_pychopper_configuration.txt -t 16

I want to keep the primers as Anchors for later on Barcode Detection of 10X Barcoded Transcripts.

ksahlin commented 3 years ago

It seems like you have access to about the same resources we used for the datasets in our paper. That is very slow and it sounds like something is not as expected.

what is the mean or median read length?

After isONclust clustering- How many clusters were generated, and what is the size of the largest cluster?

if you are running isoncorrect with the 16 cores it should start 16 individual processes (clusters of reads). Are you observing output from all these clusters?

maybe you can check what output is generated for each instance (cluster) of isoncorrect - I think this information is available and written to a file in the output-folder, although I could be wrong (I don’t have access to a computer at the moment as I’m on holiday).

jkbenotmane commented 3 years ago

Thank you for responding even on your Holidays, I appreciate that!

After the Clustering with Pipeline-discover-denovo-isoforms 8218 clusters were generated while only 2101 >=10 Reads. The biggest Cluster contains 896926 Reads and roughly <50 Clusters contain more than 10k Reads and approx 20 more than 100k Reads.

The Clustering is performed using the following configuration:

aligned_threshold: 0.4
batch_max_seq: -1
batch_size: -1
cls_mode: sahlin
concatenate: false
consensus_maximum: -150
consensus_minimum: 50
consensus_period: -1
cores: 8
kmer_size: 11
mapped_threshold: 0.7
min_fraction: 0.8
min_left_cls: 2
min_prob_no_hits: 0.1
min_qual: 7.0
min_shared: 5
pipeline: 09_08_21
pychopper_opts: -m edlib -b /Primer/10XPrimers_pychopper.fa
  -p -c /Primer/10XPrimers_pychopper_configuration.txt
  -t 8
reads_fastq:  /Only_Mapped_Reads/09_08_21_Aln_mapped.fastq
repo: https://github.com/nanoporetech/pipeline-isONclust2.git
run_pychopper: true
window_size: 15
workdir_top: OUTFOLDER_06_08/isOnClust2

I see Output from more than 16 Clusters as it starts the correction Beginning with the Smallest clusters. Therefore I see Thousands of Clusters.

In the yet uncorrected clusters I see a stderr.txt containing the Configurations isONCorrect uses for that cluster, eg.:

Total cluster of 896926 reads.
ARGUMENTS SETTINGS:
fastq isonclust2/final_clusters/cluster_fastq/0.fastq
k: 9
w: 20
xmin: 18
xmax: 80
T: 0.1
exact: False
disable_numpy: False
max_seqs_to_spoa_:200
max_seqs: 2000
use_racon: False
exact_instance_limit: 50
set_w_dynamically: False
verbose: False
randstrobes: False
set_layers_manually: False
compression: False
outfolder: isOncorrect/fastq_files/0

example stdout.txt from another run:

Temporary workdirektory: /tmp/tmpe7l8ggtj
correcting 2 reads in a batch
Window used for batch: 20
3654 MINIMIZER COMBINATIONS GENERATED
Too abundant: ACACGACGC TTTTTTTTT 20 2
Too abundant: ACGACGCTC TTTTTTTTT 20 2
Too abundant: ACGCTCTTC TTTTTTTTT 20 2
Too abundant: ATCTGGAAT TTTTTTTTT 13 2
Too abundant: AATAACCTC TTTTTTTTT 13 2
Too abundant: AACCTCAAG TTTTTTTTT 13 2
Too abundant: TTTTTTTTT ACCTAGAGC 13 2
Too abundant: TTTTTTTTT ACATTCTCC 13 2
Too abundant: TTTTTTTTT AACATTCTC 13 2
Too abundant: TTTTTTTTT AAACCCCTC 13 2
Too abundant: TTTTTTTTT AAAATAAAC 13 2
Too abundant: TTTTTTTTT AAAAATAAA 13 2
Too abundant: TTTTTTTTT AAAAAATAA 13 2
Too abundant: TTTTTTTTT AAAAAAATA 13 2
Too abundant: TTTTTTTTT AAAAAAAAT 13 2
Too abundant: TTTTTTTTT AAAAAAAAA 13 2
Too abundant: TTTTTTTTT AACACTATT 13 2
Too abundant: TTTTTTTTT GAACACTAT 13 2
Too abundant: TTTTTTTTT GTGAACACT 13 2
Too abundant: TTTTTTTTT TGTGAACAC 13 2
Too abundant: TTTTTTTTT TTGTGAACA 12 2
Too abundant: TTTTTTTTT TTTGTGAAC 11 2
Too abundant: TTTTTTTTT TTTTGTGAA 10 2
Too abundant: TTTTTTTTT TTTTTGTGA 9 2
Too abundant: AACTCGAAT TTTTTTTTT 8 2
Too abundant: TTTTTTTTT TTTTTTGTG 8 2
Too abundant: TTTTTTTTT TTTTTTTGT 7 2
Too abundant: TTTTTTTTT AACACGCCC 7 2
Too abundant: TTTTTTTTT ACAACACGC 7 2
Too abundant: TTTTTTTTT ACATACAAC 7 2
Too abundant: TTTTTTTTT AGCAGGACA 7 2
Too abundant: TTTTTTTTT AAGGCATCG 7 2
Too abundant: TTTTTTTTT AATGAAGGC 7 2
Too abundant: TTTTTTTTT AATCATTTT 7 2
Too abundant: TTTTTTTTT AAATCATTT 7 2
Too abundant: TTTTTTTTT CAAATCATT 7 2
Too abundant: TTTTTTTTT CCAAATCAT 7 2
Too abundant: TTTTTTTTT CTCTTCCAA 7 2
Too abundant: TTTTTTTTT GCTCTTCCA 7 2
Too abundant: TTTTTTTTT TGCTCTTCC 7 2
Too abundant: ATCTTATGG TTTTTTTTT 7 2
Too abundant: AGTGGGAGA TTTTTTTTT 7 2
Too abundant: TTTTTTTTT TTTTTTTTG 6 2
Too abundant: TTTTTTTTT TTGCTCTTC 6 2
Too abundant: TTTTTTTTT TTTGCTCTT 5 2
Too abundant: TTTTTTTTT ACACACAGC 5 2
Too abundant: TTTTTTTTT AAGAGAACC 4 2
Too abundant: TTTTTTTTT TTTTGCTCT 4 2
Too abundant: TTTTTTTTT TTTTTGCTC 3 2
Too abundant: TTTTTTTTT ACACAGCAT 3 2
Too abundant: AGATACTTT TTTTTTTTT 3 2
Average abundance for non-unique minimizer-combs: 3.2532467532467533
Number of singleton minimizer combinations filtered out: 2652

Done with batch_id: 0
Took 0.0162966251373291 seconds.
removing temporary workdir

It´s hard though to check for the current state of the correction as it either just Outputs the correct.fastq file or doesn´t

Atm I am running the correction of the Clusters split into 4 since two days with the Parameters mentioned above on a 32Core Machine ("Node") and 160GB Memory (of which it only uses 33) and it is not even done with Residual one yet.

ksahlin commented 3 years ago

I see, this dataset seems to have a highly variable cluster size distribution. Please see how we ran run_isoncorrect on the SIRV data set in our publication. (I think it’s with the —split_batches parameter, I still don’t have access to a computer to check the exact parameter but see the paper). This will process the largest clusters in parallel and speed up clustering 10 fold.

ksahlin commented 3 years ago

To clarify, --split_wrt_batches is the argument to specify for run_isoncorrect. here is the full command we used in the analysis in the paper.

The SIRV dataset took 6h to correct (~1.5M reads) using 32 cores. If you have about 10x more reads, the same number of cores, and your cDNA reads have about the same lengths as the ones we had in the paper (median read length of about 550bp), a ballpark estimate of runtime would be 60h (Perhaps adding some extra time for data size scale-up). However, the new parameter setting should be at least 2x faster than the one used in the paper.

Best, K

jkbenotmane commented 3 years ago

Hello Kristoffer, again Thank you !

In the Snakefile if I understood correctly you also specified --set_w_dynamically and -xmax 80. Could this also have a major Impact on Compute time ?

ksahlin commented 3 years ago

No, it should not have a major impact. The default --xmax is already set to 80 and --set_w_dynamically may only have a minor effect. --set_w_dynamically uses more accurate settings for smaller clusters but, on the whole, will not affect runtime much. I recommend setting this parameter too.

jkbenotmane commented 3 years ago

Okay will try. What I also figured: wouldn't it be possible to use racon's cuda Implementation in use_racon ?

ksahlin commented 3 years ago

If the cuda implementation is installed and it has the same interface (input/output) as the regular racon I think it should work as isoncorrect simply calls the racon binary as:

subprocess.check_call(['racon', reads_to_center, read_alignments_paf, center_file], stdout=racon_polished, stderr=racon_stderr)

Meaning that the cuda binary needs to be named racon and the arguments to this binary need to have the same order as the other version.

jkbenotmane commented 3 years ago

Hi @ksahlin ,

It does have a different interface but I think it should be easily adoptable and cuda correction could accelerate and improve the resulting correction.

The command in line 251 create_augmented_reference.py would then be sth like this I think:

subprocess.check_call(['/Path/to//racon/build/bin/racon -c 16 -b --cudaaligner-batches 16 ', reads_to_center, read_alignments_paf, center_file], stdout=racon_polished, stderr=racon_stderr)

Therefore run_isoncorrect either sticks with expecting racon in PATH or allows a more finegrained parameter tuning of racon through the args.

Maybe allow something like

--use_racon "Path to Racon" --racon_params "-c 16 -b --cudaaligner-batches 16 ...."

Though I am also not sure, if this could interfere with other lines of code.

ksahlin / isONcorrect

Run isONCorrect on multiple PCs parallel #9