entries of params were nan which will throw error in lstsq

theislab / diffxpy

Differential expression analysis for single-cell RNA-seq data.

https://diffxpy.rtfd.io

BSD 3-Clause "New" or "Revised" License

193 stars 23 forks source link

entries of params were nan which will throw error in lstsq #148

Open QianjiangHu opened 4 years ago

QianjiangHu commented 4 years ago

Hey, Author, I got a warning message "entries of params were nan which will throw error in lstsq" when I run the test for anndata with code: test = de.test.two_sample(YO_adata_AT2,grouping='grouping', test='wald', noise_model="nb") I want to know what is the problem here and how to fix it.

Thank you! Screenshot from 2020-03-19 18-55-53

davidsebfischer commented 4 years ago

Hi @Qianjiang-Github, sorry for the delay!

Is there any nan in your input data?
Which diffxpy and batchglm version are you using?

KeitaSaeki commented 4 years ago

Hi,

I also encountered the same problem. I guess because of the funny "0" on GEM barcode column. I would appreciate it if someone help me to remove it.

Best,

Keita

davidsebfischer commented 4 years ago

@KeitaSaeki, I am not sure whether this is really the same underlying issue:

I guess because of the funny "0" on GEM barcode column.

You can probably get rid of it by defining the dataframe with an index when passing it to anndata, right now you exert relatively little control over the nature of the dataframe because you simply call its constructor with a pandas series.

marcellp commented 4 years ago

I am currently facing the exact same problem unfortunately. I am trying to use this with an anndata dataframe:

        res = de.test.two_sample(
            self.adata, grouping="de_base", test="wald", noise_model="nb",
        )

and this error is thrown.

I am trying to investigate it further to raise a specific way to reproduce it, but it is certainly an issue. I also made sure I have no NaNs anywhere:

        print(np.argwhere(np.isnan(self.adata.X)))
        print(self.adata.obs.isnull().sum().sum())
        print(self.adata.var.isnull().sum().sum())

all return:

[]
0
0

I just installed diffxpy, so it should be on the latest version available via pip.

alitinet commented 4 years ago

Hi folks,

I also had the same issue when trying to run de.test.wald. As @KeitaSaeki suggested, it looks like the problem was the name of the index column, i.e. in my case this Cell_Index and the empty line created NaNs. You can set it to None and remove the empty line by running adata.obs = adata.obs.rename_axis(None) and then everything works just fine. Screenshot 2020-06-16 at 20 23 16

Thanks, David, for the great package!

Update: scratch that, it only worked for one dataset and doesn't work for others.

dawe commented 4 years ago

I'm facing the very same issue, the index name is not the culprit. If you, instead, make use of scaled data (e.g., sc.pp.scale) the NB estimator introduces NaNs. I've tried to run with unscaled data and it is working perfectly. @davidsebfischer I see that only 'nb' is accepted as noise_model parameter, I understand batch_glm supports gaussian noise, would it be useful to allow it in the de.test.wald function?

xpastor commented 4 years ago

Hi, I'm facing the same problem running de.test.wald. If I run np.isnan(np.sum(adata.raw.X.toarray())) it returns False.

ywen1407 commented 4 years ago

having the same issue here...

fairliereese commented 4 years ago

Same here, I tried removing the index names as was suggested and also checked for NaNs in my data using @xpastor s line of code, which returned False.

vladie0 commented 3 years ago

Same issue here, I don't think this package is maintained anymore looking at the updates and maintenance

bsierieb1 commented 3 years ago

Exact same issue with my data, while everything works fine with simulated data from the tutorial.

My code:

test = de.test.wald(
    data=adata,
    formula_loc="~1+myfactor",
    factor_loc_totest="myfactor"
)

There are no NaNs in my data matrices (both np.count_nonzero(np.isnan(adata.X)) and np.count_nonzero(np.isnan(adata.raw.X)) return 0). Also, the cell index column in adata.obs has no name, so that cannot be an issue either.

rojinsafavi commented 3 years ago

any updates?

teryyoung commented 1 year ago

the same problem here, hoping some solutions