FillCollisionsAdjustmentInfo is taking too long to complete

k3yavi commented 6 years ago

Hi @VPetukhov , I am trying to use DropEst with one of the 10x datasets from here. I downloaded the BAM from the above link and used dropest -f mode to generate the count matrix and the rds file for the downstream UMI correction. Unfortunately I am noob in R and just copy pasting code from this tutorial. I have got a couple of question regarding the working of dropEst pipeline:

When we provide an external BAM (like from 10x website in my case) what type of UMI correction does dropest command do before generating the count matrix? I am guessing it simply count the tags from the BAM if I don't give either -m or -M?

When I try to use dropestr R package following the above tutorial it seems to give good umi distribution till here. But once I try to use the function

collisions_info <- FillCollisionsAdjustmentInfo(umi_probabilities, max_umi_per_gene, step=20, mc.cores=5, verbose=T)

the program give following error

> FillCollisionsAdjustmentInfo(umi_probabilities, max_umi_per_gene, step=20, mc.cores=5, verbose=T)
Error in FillCollisionsAdjustmentInfo(umi_probabilities, max_umi_per_gene,  :
unused arguments (step = 20, mc.cores = 5, verbose = T)

So I removed step = 20, mc.cores = 5, verbose = T but it has been 3 days now and the program won't finish.

What am I doing wrong? Thanks again for your help.

my config.xml

<config>
    <!-- droptag -->
    <TagsSearch>
        <protocol>10x</protocol>
        <BarcodesSearch>
            <barcode1_length>8</barcode1_length>
            <barcode2_length>16</barcode2_length>
            <umi_length>10</umi_length>
            <r1_rc_length>0</r1_rc_length>
        </BarcodesSearch>

        <Processing>
            <min_align_length>10</min_align_length>
            <reads_per_out_file>10000000</reads_per_out_file>
            <poly_a_tail>AAAAAAAA</poly_a_tail>
        </Processing>
    </TagsSearch>

    <!-- dropest -->
    <Estimation>
        <Merge>
        <barcodes_file>/mnt/scratch5/avi/dropest_data/barcodes.tsv</barcodes_file>
            <barcodes_type>const</barcodes_type>
            <min_merge_fraction>0.2</min_merge_fraction>
            <max_cb_merge_edit_distance>2</max_cb_merge_edit_distance>
            <max_umi_merge_edit_distance>1</max_umi_merge_edit_distance>
            <min_genes_after_merge>100</min_genes_after_merge>
            <min_genes_before_merge>20</min_genes_before_merge>
        </Merge>

        <PreciseMerge>
            <max_merge_prob>1e-5</max_merge_prob>
            <max_real_merge_prob>1e-7</max_real_merge_prob>
        </PreciseMerge>
    </Estimation>

    <BamTags> <!-- Optional. Tags, which are used to parse .bam file (-f option) or to print tagged .bam file (-b or -F options). Default values correspond to 10x protocol. -->
        <cb>CB</cb> <!-- Cell barcode. Default: CB. -->
        <cb_raw>CR</cb_raw> <!-- Cell barcode raw. Used only for bam output. Default: CR. -->
        <umi>UB</umi> <!-- UMI. Default: UB. -->
        <umi_raw>UR</umi_raw> <!-- UMI raw. Used only for bam output. Default: UR. -->
        <gene>GX</gene> <!-- Gene id. Default: GX. -->
        <cb_quality>CQ</cb_quality> <!-- Cell barcode quality. Default: CQ. -->
        <umi_quality>UQ</umi_quality> <!-- UMI quality. Default: UQ. -->
    </BamTags>
</config>

VPetukhov commented 6 years ago

Hi @k3yavi , Ohm, this function should work really fast. Can you please give me values of max_umi_per_gene and length(umi_probabilities)?

k3yavi commented 6 years ago

Hi @VPetukhov , The experiment has the following numbers:

> max_umi_per_gene
[1] 4192617
> length(umi_probabilities)
[1] 1048576

VPetukhov commented 6 years ago

4 millions UMIs per gene? It's definitely not a correct number. Can you please publish your dropEst log?

k3yavi commented 6 years ago

Unfortunately I can't find the log of the run, but I just used the bam from the 10x website and the config as attached above. Attaching the results.html file.

report.html.zip