meringlab / FlashWeave.jl

Inference of microbial interaction networks from large-scale heterogeneous abundance data
Other
71 stars 8 forks source link

metadata is not masked when data is masked due too little datapoints. #21

Closed jonasjonker closed 3 years ago

jonasjonker commented 3 years ago

Hello. I'm using FlashWeave for my Master Thesis, and I think I ran into a bug.

I discovered it when I was getting myself familiar with your software by practicing on a small dataset Qitta: [ID 1001].

I posted the error message below:

netw_results = FlashWeave.learn_network(data_path, metadata_path,
                                        verbose       = true,
                                        sensitive     = true,  # causes an error if true
                                        heterogeneous = false)

ArgumentError: number of rows of each array must match (got (23, 26))
in top-level scope at Repos/Thesis/src/scripts/1001/1001makegraph.jl:82
in  at FlashWeave/464SQ/src/learning.jl:293
in #learn_network#123 at FlashWeave/464SQ/src/learning.jl:316
in  at FlashWeave/464SQ/src/learning.jl:383
in #learn_network#124 at FlashWeave/464SQ/src/learning.jl:439
in normalize_data##kw at FlashWeave/464SQ/src/preprocessing.jl:515 
in #normalize_data#159 at FlashWeave/464SQ/src/preprocessing.jl:532
in  at FlashWeave/464SQ/src/preprocessing.jl:461
in #preprocess_data_default#158 at FlashWeave/464SQ/src/preprocessing.jl:466
in preprocess_data##kw at FlashWeave/464SQ/src/preprocessing.jl:326 
in #preprocess_data#153 at FlashWeave/464SQ/src/preprocessing.jl:437
in hcat at stdlib/v1.5/SparseArrays/src/sparsevector.jl:1078
in typed_hcat at base/abstractarray.jl:1391 
in _typed_hcat at base/abstractarray.jl:1404

I read your code and I think I understand what is going wrong. adaptive_pseudocount!() masks and removes samples from the matrix. This function is indirectly called in preprocessing_data() (via adaptive_clr!() and clrnorm()). However after this point the masked matrix is merged with the unmasked metadata. Because these matrices are not of the same size at this point an error is thrown.

preemptively dropping metadata columns doesn't work either:

netw_results = FlashWeave.learn_network(data_path, meta_data_with_dropped_cols,
                                        verbose       = true,
                                        sensitive     = true,  # causes an error if true
                                        heterogeneous = false)

AssertionError: observations of data do not fit meta_data: 26 vs. 23
in top-level scope at Repos/Thesis/src/scripts/1001/1001makegraph.jl:82
in  at FlashWeave/464SQ/src/learning.jl:293
in #learn_network#123 at FlashWeave/464SQ/src/learning.jl:303
in  at FlashWeave/464SQ/src/misc.jl:13
in #check_data#46 at FlashWeave/464SQ/src/misc.jl:13

I didn't think of an (easy) fix yet. But If you like I don't mind spending some time on it.

I can provide you with my exact code if you'd like to reproduce the error.

jtackm commented 3 years ago

Hi Jonas, Thanks for the feedback, great catch! I just pushed a fix, could you let me know if this works now? To get these changes, you can write in julia's pkg prompt

(v1.2) pkg> add FlashWeave#master
jonasjonker commented 3 years ago

Hey Janko,

Thank you for the fix! I ran my code with the update and it solved the error.

jtackm commented 3 years ago

Great! Should be able to tag a new version this week, then you won't have to be on master for this.