Closed murphymadeleine21 closed 3 years ago
Merging #288 (86d23bd) into master (1f29053) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #288 +/- ##
=======================================
Coverage 87.97% 87.97%
=======================================
Files 7 7
Lines 449 449
=======================================
Hits 395 395
Misses 54 54
Flag | Coverage Δ | |
---|---|---|
unittests | 87.97% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 1f29053...86d23bd. Read the comment docs.
Okay, I can do that: But is there maybe some reasonable way to throw out outlier values? I don't love the arbitrary yscale.
By outliers do you mean negative numbers? Are they coming from a specific measurement? Potentially you can just cut off the negative numbers to be zero, but I'd like to know how common they are before doing that.
Actually, one other thing. It'd be good to have IgG on the x-axis, and the subject groups be represented by different colors.
There are some negative numbers, yes, and I think one incredibly large number (like in the thousands). I could try to set all values outside of a certain range to 0 or nan?
It looks like there's 64 negative values and I don't see any specific trend across classes that seems to be causing them. However, IgG4 makes up 54 of the negative values.
Think you can clip them to be 0. We do this in the factorization.
Also—what is the conclusion here? Does the ratio distinguish progression? We should get you past plotting on its own toward making analysis determinations.
To me, I see a few patterns:
Well, the task is to distinguish non-progression and progression, so I think you'll probably need to build the prediction model and see how well it does.
Based on that, we can then include this in the supplement and adjust our manuscript's language to address the reviewer comment. They wanted us to soften our language that it would be "nearly impossible" to identify this antigen trend. I'd like to see your text suggestions for addressing this.
Great! I'm happy to start thinking more critically and on my own about things, and I can get started on seeing what we can find with the predictions.
In terms of building the prediction model, should I be trying to predict class using these average ratio values (aka, this subsetted and averaged dataset)?
Also, there are a handful of values > 25 (about 18?) And about 57 values > 10. Only 4 values > 100. I'm not entirely sure what the right upperbound for the clipping should be. There are ~700 measurements total.
Why would it be something other than 0?
Alternatively, could just clip the values > 100, but cut off what the graph shows at ~15.
Well, I wasn't sure if clipping large values to 0 also made sense, if that is what you're asking? I more meant, at what point do we say a value is too large?
No, you should only clip negative values to zero.
Right, okay that's what I have it doing (only clipping negative to 0), but I also have it clipping values > 100 to 100. We have one value that's ~7000 and I think should be thrown out somehow. But if we keep these large values, we have to cut out some values from what we show in the graph (for example, by making ylim =15). Can see in the next commit:
I also changed my gitignore file and didn't know if I should be committing that too?
Yes
Right now, every subject has 4 values, all of which will fall into one of the class categories (because each subject has one class), and will differ only on which IgG we looked at. Some values are cut out because they seemed like major outliers? For now I just put the axis as -0.5 to 2. I requested Dr. Meyer as a reviewer since this seemed to be his figure he knew about, but if you want @cyrillustan to take a look instead, let me know.