jakobilab / circtools

circtools: a modular, python-based framework for circRNA-related tools that unifies several functionalities in a single, command line driven software.
http://circ.tools
GNU General Public License v3.0
0 stars 2 forks source link

Issues Regarding Circtools Quickcheck #5

Open hn929 opened 6 months ago

hn929 commented 6 months ago

Dear Tobias,

I am trying to run circtools quickcheck to get an overview of my circRNA counts after mapping & detection through circtools detect. I have allowed quickcheck to run for a maximum of 72 hours and have repeatedly had my output log freeze during the grouping stage of quickcheck. My output logs have looked like this repeatedly and I was wondering if there is any way to parallelize this run or to help speed the process of grouping.

Output log:

Loading CircRNACount
Loading LinearRNACount
Parsing data
Found 6 data columns in provided data
2 different groups provided
Assuming (1,2),(1,2),(1,2),... sample grouping  

Thank you!

tjakobi commented 6 months ago

Dear @hn929,

Thank you for your report.

The quickcheck module should run fairly fast in a matter of minutes, as it only reads the log files produced by STAR and the output generated by circtools detect.

What is the command line that you using for quickcheck?

Thank you,

Tobias

hn929 commented 6 months ago

Dear @tjakobi ,

Thank you for your reply.

The command I am using for quickcheck is: python "$circtools_repo/circtools.py" quickcheck -d "${Input_dir}/" -s "${STAR_dir}/" -o "${Output_Directory}/" -S _trimmed -l Ctrl-,NO- -g 1,1,1,2,2,2 -C _trimmed_Soft_Circtools_Mate_Pairs___Chimeric.out.junction

The Input_dir is the path to the detect output and the STAR_dir is the directory containing the STAR mapping outputs per sample for paired mapping and individual mate mapping.

As well, I have moved on to the circtest module and have stumbled on the same error stating no candidates passing the specified threshold. The command line I executed is : python "$circtools_repo/circtools.py" circtest -d "${Input_dir}/" -l Ctrl,NO -c 4,5,6,7,8,9 -r 3 -p 0.01 -g 1,1,1,2,2,2 -s 3 -C 2 -o "${Output_Directory}/"

I also ran circtest in R using the underlying R package and received this error:

49 candidates processed in total
0 candidates passed the specified thresholds
Warning messages:
1: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim).

2: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim).

3: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim).

4: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim).

5: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim).

6: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim).

7: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim).

8: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim).

9: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim).

10: In betabin(cbind(circ, tot - circ) ~ group, ~1, data = testdat) : 
Possible convergence problem. Optimization process code: 10 (see ?optim). 

I have checked my outputs from detect and they are in the correct format and have data points, however I repeatedly stumble on the same error despite debugging efforts.

tjakobi commented 3 months ago

This is not an error, but a warning that is originating from the aod package, see https://github.com/dieterich-lab/CircTest/issues/11#issuecomment-624906847.

Just to follow up on this, the long processing time seems very odd to me, have you used an incredibly large dataset?