Closed kate-stankiewicz closed 1 year ago
Dear @kate-stankiewicz ,
sorry for the late reply. I sometimes fall behind on answering issues due to other higher priority items.
To refer to your individual steps above, I will refer to them numbered from 1 to 6.
In step 1, you generate a single splicing graph per input bam file. In the same step, you quantify the single graphs in that sample, as the --quantify-graph
options is set by default. What you would need in this case is to set the --no-quantify-graph
option. This will simply generate the individual sample graphs. If you don't have the --no-quantify-graph
option, then the single graphs are generated in any case, but also the quantifications per single bam file for each respective graph (which you might not want).
In step 2, you merge the single sample graphs into a joint graphs. That is, all events detected in all samples will be merged per gene. Also here, you would like to add the --no-quantify-graph
option, as otherwise the quantifications for the merged graph are done per sample, but in a sequential manner. (This is what you would like to parallelise in step 3.)
In step 3, you will now quantify the merged graph with each single bam file. That is, also events detected in only one sample will get quantified in all samples. This is sometimes useful, as the criteria for adding a new event to the splicing graph are quite stringent. So you might not have enough reads in a sample to add a new event, but you have enough to quantify it, if it was detected in another sample. If you do this in parallel over samples, you can speed up this process depending on how many samples you run in parallel.
In step 4, you now collect the quantifications from step 3. This should generate genes_graph_conf3.merge_graphs.count.hdf5
and genes_graph_conf3.merge_graphs.gene_exp.hdf5
(assuming you run with default confidence level 3). However, in your above run, these files were already generated by step 2. Hence nothing else needed to be done. I agree, that the verbose mode could indeed be a bit more (or at all) verbose about this ...
I hope this answers your question. Please let me know if you have additional ones.
Best, Andre
Hi Andre,
Thank you for this informative and in depth explanation--it is really helpful! So if I am understanding correctly, the results between running these steps in the way that I have and running them as you outlines here should not change, correct? The only thing I have missed out on is better parallelization in delaying the graft quantification until a later step?
Also, to possibly save you having to address a similar issue from another user, the --no-quantify-graph option is not listed in the steps for use on large cohorts: https://spladder.readthedocs.io/en/latest/spladder_cohort.html
Only the --no-extract-ase option is listed in the earlier steps. Just wanted to let you know since it seems to be important to include and makes sense as you explained it above.
Hi @kate-stankiewicz ,
Thanks for the hint with the documentation. I will fix this right away. Regarding you statement about completeness - yes, this is true. It only took longer than necessary, but the outputs are the same.
Best,
Andre
Hi @akahles ,
Thanks for the clarification! I will close this issue now.
Description
I have followed the steps for running SplAdder on large cohorts as outlined here https://spladder.readthedocs.io/en/latest/spladder_cohort.html
However, I am not sure what the second phase in the quantification step is doing? On the documentation page is says "As a second step to this phase, we need to collect the individual quantifications and aggregate them in a joint database:". However, when I run this step I get no output and no log information either (even if run in -v mode). Additionally, it only runs for about 30 seconds (all other steps in the workflow run for at least 30 mins, if not days). From what I can tell, no new files are created, nor are any existing files modified. The subsequent steps (event calling and testing) seem to run despite this. Is the step doing anything that might not be visible to me? Or is it possible to skip it?
Many thanks!
What I Did