morganjwilliams / pyrolite

A set of tools for getting the most from your geochemical data.
https://pyrolite.readthedocs.io
Other
133 stars 37 forks source link

Convenience features for REE geochemistry and plotting [FEATURE] #35

Closed kaarelmand closed 4 years ago

kaarelmand commented 4 years ago

Is your feature request related to a problem? Please describe. While pyrolite is already a highly useful tool for working with REE data, some additional convenience features for common workflows in surface REE cycling would be really handy:

  1. It is most common in my field to include yttrium together with REE, inserted between Dy and Ho per its ionic radius (e.g., [1]). The combination is termed 'REY'. While it's very convenient to fetch all REEs using geochem.ind.REE(), no such function exists for the REYs.

2. Pyrolite is missing the most used reference composition for my field -- Post-Archaean Australian Shale (PAAS) [2]. This composition data is surprisingly hard to find online, so I'll try to get a hold of the book. Added in PR #37.

  1. Pm is always below detection limit, but the literature standard in my field is to include it on the plot nevertheless. In pyrolite, this leaves a hole at the position of Pm. Most plots in literature instead draw a line straight from Nd to Sm: Screenshot from 2020-02-24 13-26-43 I wish there was a way to automatically do this in spiderplot or the REE_vs_radii plot.

Describe the solution you'd like

  1. A REY() function identical to REE(), except for the addition of Y between Dy and Ho.

2. Add PAAS data to the reference composition database.

  1. An additional switch in spiderplot that draws lines to bridge missing data.

Describe alternatives you've considered

  1. I can just define my own REY array.

2. I can store PAAS data in an array and feed that directly to pyrochem.normalize_to().

  1. I haven't really figured out how to do this while utilizing pyroplot yet, especially given #27, but here is a StackOverflow discussion on this with matplotlib.

Relevant References [1] Bau, M. and Dulski, P. (1996). Distribution of yttrium and rare-earth elements in the Penge and Kuruman iron-formations, Transvaal Supergroup, South Africa. Precambrian Research, Geology and Geochemistry of the Transvaal Supergroup 79, 37–55. https://doi.org/10.1016/0301-9268(95)00087-9

[2] Taylor, S.R. and McLennan, S.M. (1985). The Continental Crust; Its composition and evolution; an examination of the geochemical record preserved in sedimentary rocks. Blackwell, Oxford. 312.

kaarelmand commented 4 years ago

It just occurred to me that REY (suggestion no. 1) could be better implemented simply as a switch to REE() that inserts Y right after Dy.

morganjwilliams commented 4 years ago

Thanks for the feature requests @kaarelmand!

For the Pm broken-lines issue, the simple solution is to discard it from the index/x-axis when building the plot, and I have a quick fix for this ready to go. While it may be a literature standard, the inclusion of Pm in the index/x-axis label for the pyrolite.plot.spider.REE_v_radii seems a bit disingenuous. While it makes sense to get the spacing right on a spider plot (like the one above), the REE_v_radii plots are spaced by radii, and as such the position of the other elements will be unaffected by whether you include Pm or not.

On the spider plot side, for elements other than Pm it's informative to see where you're missing data, and the line-across-the-gap method would start to interfere with that (especially when it comes to larger datasets, and where the density methods come in). The added complexity is that plotting would then need to be done on a per-line basis (or at least per-missing-data-pattern basis). Let me know if this REE_v_radii solution doesn't fit (i.e. you need the Pm label on the x-axis) and perhaps we can work something out which does.

As for the PAAS data, once you find it, it should fit within the current refcomp folder. Have a look at the other data files there, and me know if you have any questions about getting this in the right format. Have a look at McLennan (2001), this includes an updated PAAS composition and a few others which you might want to bring in too.

For REY, I think creating an indexing function akin to pyrolite.geochem.ind.REE might be the way to go. This could be supplemented with a selection method on pyrolite.geochem.pyrochem and a plotting method akin to pyrolite.plot.pyroplot.REE.

Finally, let me know whether you'd like to work on any of these issues yourself (in which case feel free to ask any pyrolite/development questions), otherwise I'll have a crack.

McLennan, S.M. (2001) Relationships between the trace element composition of sedimentary rocks and upper continental crust. Geochemistry, Geophysics, Geosystems 2. doi.org/10.1029/2000GC000109.

kaarelmand commented 4 years ago

Thanks for the thorough reply!

For the 1st and 2nd suggestions, I could certainly try my hand at submitting some PRs, though it would be my first official code contribution, so I might have some questions here or on gitter later, if you don't mind.

As for jumping the Pm gap, I agree that just skipping Pm usually works and if you can use the REE_vs_radii plot, the problem is already solved. I guess the issue is that I'm most often working as a contributor on someone else's paper and don't have too much creative control when making figures. These authors usually implore me to go with the most traditional plotting practices (for my field): a spider plot-like equally-spaced distribution of elements on the x-axis, and a line over Pm. Perhaps the ideal solution here would be to style the Pm gap line differently (e.g., dashed), so as to keep the REY pattern continuous, but highlight the lack of Pm data. I imagine this would likely introduce a lot of complexity for little gain; perhaps this issue could go on the back-burner then.

And to clarify, do you think a pyrolite.geochem.ind.REYfunction would be better, or just an includeY switch for the pyrolite.geochem.ind.REE function (and the other REE methods), like the dropPm switch?

morganjwilliams commented 4 years ago

Feel free to ask anything related to the issues here, but general discussion or development questions might be better directed to Gitter. Happy to help out wherever needed!

I'll have a bit more of a think about it, but I think an REY() function could be less verbose (i.e. df.pyrochem.REY() over df.pyrochem.REE(includeY=True). There might need to be some modifications of plotting functions to deal with Y.

For the Pm data - I'll think about this a bit more, but perhaps we can create a separate issue for this as it's likely to come up in other spider plot discussions.

kaarelmand commented 4 years ago

Meanwhile, I discovered this hilarious paper (have a read of the abstract + graphical abstract for a good laugh), and feel that I should likely add this new European Shale as well.

And while I'm at it, I should round out my PR by adding some updated variants of PAAS, EUS, NASC, MUQ, more from Condie (1993) and others in the above paper.

morganjwilliams commented 4 years ago

It's ridiculous how similar they all are! Looks like we'll have to source some more hard rock reference compositions to keep up.

I'll keep an eye out and look to merge soon after you finish it up.

morganjwilliams commented 4 years ago

Afternoon @kaarelmand. I ended up adding the REY functionality - check it out on the develop branch and let me know if this is roughly what you were after. It passes basic tests, and follows the ordering from the spider plot you used above. While there isn't a plotting method for this one, you should be able to use df.pyrochem.REY.pyroplot.spider() or similar to achieve that plot. If this is up to scratch I'll close the issue. Cheers!

morganjwilliams commented 4 years ago

For reference, it was added with 9582f1085c30d300156ebca72544338a4c6f5222 (you can check the commit for an overview of additions).

morganjwilliams commented 4 years ago

Closing this one, as this seems to have addressed the issue.