nloyfer / wgbs_tools

tools for working with Bisulfite Sequencing data while preserving reads intrinsic dependencies
125 stars 33 forks source link

find_markers error #34

Closed dengyihan1464 closed 1 year ago

dengyihan1464 commented 1 year ago

Thank you for the tool!

I have met some problems when running:

$ wgbstools find_markers -g groups.csv --betas GSE186458_RAW/*.hg38.beta -b blocks..bed.gz --min_cpg 5 --min_bp 10 --max_bp 1500 -c 10

The error is:

py:87: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  blocks_df[b] = dres[b]
Invalid input argument
`popmean.shape[axis]` must equal 1.

I would be greatly appreciated if you could spend some of your time check the process for me!

dengyihan1464 commented 1 year ago

The possible cause led to the issue is numpy's version.

GWW commented 1 year ago

There is a simple code fix for this, in on lines 246-249

            if len(self.tg_names) == 1:
                r = ttest_1samp(tf[self.bg_names], tf[self.tg_names].values.flatten(), axis=1, nan_policy='omit')
            elif len(self.bg_names) == 1:
                r = ttest_1samp(tf[self.tg_names], tf[self.bg_names].values.flatten(), axis=1, nan_policy='omit')

The flatten() call needs to be removed:

            if len(self.tg_names) == 1:
                r = ttest_1samp(tf[self.bg_names], tf[self.tg_names].values, axis=1, nan_policy='omit')
            elif len(self.bg_names) == 1:
                r = ttest_1samp(tf[self.tg_names], tf[self.bg_names].values, axis=1, nan_policy='omit')