meringlab / FlashWeave.jl

Inference of microbial interaction networks from large-scale heterogeneous abundance data
Other
71 stars 8 forks source link

integer as category instead of numeric #16

Closed tapj closed 4 years ago

tapj commented 4 years ago

Hi,

when I used FlashWeave with metadata, if metadata like "age" is integer then FlashWeave consider it as category and onehot encoded it. How can I indicated him which metadata are numeric and which one are category ?

this is related to this issue https://github.com/meringlab/FlashWeave.jl/issues/12 but not solved with me.

thank you!

jtackm commented 4 years ago

FlashWeave currently only one-hot encodes string columns as factors, more specifically columns that are identified as string by julia's readdlm function (see DelimitedFiles.jl). We may support more fine-grained control in the future, but for now replacing the age values in your input file with strings (e.g. 18 -> A18, 60 -> A60) should get you the desired behaviour.

tapj commented 4 years ago

Sorry I may have not be clear. Actually, I have the reverse issue.

I have age like this

18
18
32
37
60
60
60

but flashweave interpret this as category not like numbers

So my tricks for now is to add a small random value like this

18.000001
18.000002
32.000001
60.000005
60.000003

and then flashweave interpret this as number as expected. Is there more clever way ?

jtackm commented 4 years ago

I see. This should indeed have been fixed with issue #12, adding small floats should definitely not be necessary. And if I put that exact column inside a meta data file like this:

AGE
18
18
32
37
60
60

AGE is not one-hot encoded for me. I assume you are still on #master or version v0.16.0? If yes, could you run ] + test FlashWeave (this will take a couple of minutes) to see if the tests that should catch this are passing?

jtackm commented 4 years ago

Closing this for now, please let me know if there are any news.