rvolden / C3POa

Computational pipeline for calling consensi on R2C2 nanopore data
GNU General Public License v2.0
30 stars 16 forks source link

Number of repeats used for consensus #18

Open OscarT32 opened 3 years ago

OscarT32 commented 3 years ago

I have generated consensus sequences for different datasets using C3POa. I am trying to do some stats by stablishing a correlation between the number of subreads and the accuracy of the consensus. When I am splitting the output file based on the information present in the header of each consensus sequence in the C3POa output, I have noticed that there is a jump from "1" to "3" without any sequences with "2" in all my output files. I have checked my input file and I have data that should fall into the "2" category. I am not sure why this is happening or If I am misunderstanding the output file. Thanks! for your assistance.

rvolden commented 3 years ago

What version of C3POa are you using? If you're running something older, I suggest updating to the latest version (v2.2.2). I haven't seen this come up in my test dataset. This is what I see when I plot out the accuracy per coverage bin: 2repswarm

I think what's probably happening is there's a bug in the consensus script that's used for pairwise consensus calling. As far as I know, there shouldn't be any problems with it in the most updated version. If you're on the latest C3POa version and you're still not seeing any reads with a coverage of 2, add .get() to the apply_async call on line 247. This will disable threading for the consensus calling and it will actually show you the errors.

OscarT32 commented 3 years ago

Thanks for your answer. I am not using the latest version of C3POa. I was trying to install the latest version but it seems that I have an issue installing "pyabpoa". When I use any of the two commands that you indicate to install the different packages I get the following warning (I am sorry if its something simple, I am fairly new to this. I am using Ubuntu 18.04):

pyabpoa

rvolden commented 3 years ago

Do you have Cython installed? pip3 install --user Cython should do the trick. To cover all of your bases, try pip3 install --user --upgrade Cython setuptools wheel. Then you can try to install pyabpoa using pip. If that doesn't work, you can clone the abPOA repo and run make install_py

OscarT32 commented 3 years ago

Thanks for the suggestions. Installation worked properly! I have started running some data that I ran on previous versions but I am having some issues.

When I use -q to filter the input file this warning is displayed: image

When I remove -q, C3POa starts running but the ran finishes only after a few minutes (this is really fast compared to the previous version in which the same data set takes a few hours). When I checked the output, the "R2C2_consensus. fasta" file is really small with only a few sequences. The log file shows that only a few sequences are actually filtered compared to the total number of sequences (I have filtered sequences by size previously):

image

This is the command line I am using to run C3POa

image

Once again thank you for your assistance

rvolden commented 3 years ago

Can you follow the debug step seen here: https://github.com/rvolden/C3POa/issues/17#issuecomment-783469536

For some reason python multiprocessing doesn't like passing back errors, so it'll just die silently instead of complaining.

OscarT32 commented 3 years ago

I followed the debug step. The following error was displayed: image

rvolden commented 3 years ago

Seems to be a problem with pyabpoa, can you verify that your install is working correctly? It may have installed but it could still run into runtime errors

OscarT32 commented 3 years ago

Thanks for your help. Indeed the problem was with pyabpoa install. Now C3POa is running properly but when I try to use -q 9 it says unrecognized argument. When I use -h, the -q argument is not available. When I do not include it C3POa runs without any issues. image

rvolden commented 3 years ago

Yeah, we took out that option since ONT qscores are mostly nonsensical