murphymadeleine21 commented 3 years ago

Right now, every subject has 4 values, all of which will fall into one of the class categories (because each subject has one class), and will differ only on which IgG we looked at. Some values are cut out because they seemed like major outliers? For now I just put the axis as -0.5 to 2. I requested Dr. Meyer as a reviewer since this seemed to be his figure he knew about, but if you want @cyrillustan to take a look instead, let me know.

figureEV5

codecov[bot] commented 3 years ago

Codecov Report

Merging #288 (86d23bd) into master (1f29053) will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #288   +/-   ##
=======================================
  Coverage   87.97%   87.97%           
=======================================
  Files           7        7           
  Lines         449      449           
=======================================
  Hits          395      395           
  Misses         54       54

Flag	Coverage Δ
unittests	`87.97% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1f29053...86d23bd. Read the comment docs.

murphymadeleine21 commented 3 years ago

Okay, I can do that: figureEV5 But is there maybe some reasonable way to throw out outlier values? I don't love the arbitrary yscale.

aarmey commented 3 years ago

By outliers do you mean negative numbers? Are they coming from a specific measurement? Potentially you can just cut off the negative numbers to be zero, but I'd like to know how common they are before doing that.

aarmey commented 3 years ago

Actually, one other thing. It'd be good to have IgG on the x-axis, and the subject groups be represented by different colors.

murphymadeleine21 commented 3 years ago

There are some negative numbers, yes, and I think one incredibly large number (like in the thousands). I could try to set all values outside of a certain range to 0 or nan?

murphymadeleine21 commented 3 years ago

It looks like there's 64 negative values and I don't see any specific trend across classes that seems to be causing them. However, IgG4 makes up 54 of the negative values.

aarmey commented 3 years ago

Think you can clip them to be 0. We do this in the factorization.

aarmey commented 3 years ago

Also—what is the conclusion here? Does the ratio distinguish progression? We should get you past plotting on its own toward making analysis determinations.

murphymadeleine21 commented 3 years ago

To me, I see a few patterns:

IgG4 have a pretty consistent ratio of gp120/p24 being very close to 0 across subject classes. The IgG4 ratios seem to vary a lot less (almost all ratio values are 0).
For all IgGs except IgG3, progressors tend to have a higher variability and higher average gp120/p24 ratios. I don't know that these differences are statistically significant enough to say that the ratio defines progression though? I would also need to look more into the biology behind IgG4 and gp120/p24 binding to see why those measurements are so different?

aarmey commented 3 years ago

Well, the task is to distinguish non-progression and progression, so I think you'll probably need to build the prediction model and see how well it does.

Based on that, we can then include this in the supplement and adjust our manuscript's language to address the reviewer comment. They wanted us to soften our language that it would be "nearly impossible" to identify this antigen trend. I'd like to see your text suggestions for addressing this.

murphymadeleine21 commented 3 years ago

Great! I'm happy to start thinking more critically and on my own about things, and I can get started on seeing what we can find with the predictions.

In terms of building the prediction model, should I be trying to predict class using these average ratio values (aka, this subsetted and averaged dataset)?

murphymadeleine21 commented 3 years ago

Also, there are a handful of values > 25 (about 18?) And about 57 values > 10. Only 4 values > 100. I'm not entirely sure what the right upperbound for the clipping should be. There are ~700 measurements total.

aarmey commented 3 years ago

Why would it be something other than 0?

murphymadeleine21 commented 3 years ago

Alternatively, could just clip the values > 100, but cut off what the graph shows at ~15.
figureEV5

murphymadeleine21 commented 3 years ago

Well, I wasn't sure if clipping large values to 0 also made sense, if that is what you're asking? I more meant, at what point do we say a value is too large?

aarmey commented 3 years ago

No, you should only clip negative values to zero.

murphymadeleine21 commented 3 years ago

Right, okay that's what I have it doing (only clipping negative to 0), but I also have it clipping values > 100 to 100. We have one value that's ~7000 and I think should be thrown out somehow. But if we keep these large values, we have to cut out some values from what we show in the graph (for example, by making ylim =15). Can see in the next commit:

murphymadeleine21 commented 3 years ago

I also changed my gitignore file and didn't know if I should be committing that too?

aarmey commented 3 years ago

Yes

murphymadeleine21 commented 3 years ago

figureEV5

meyer-lab / systemsSerology

Gp120/p24 Ratio figure #288

Codecov Report