zackbatist / open-archaeo

A list of open source archaeological software and resources
Creative Commons Zero v1.0 Universal
85 stars 16 forks source link

Bioarchaeology and archaeogenetics #20

Closed nevrome closed 3 years ago

nevrome commented 3 years ago

James PR (#19) is kind of related to the question how open-archaeo should cover the rather big world of aDNA-related bioinformatics. Having archaeodiet and gargammel in the list is kind of arbitrary, given that there is a lot more aDNA software.

Where did you want to draw the line here, @zackbatist? Intuitively I would have suggested to leave it out, except for tools that are very clearly intended for use by archaeologists. The overwhelming majority of aDNA software is written by geneticists for geneticists. On the other hand James also pointed out to me that there is no comprehensive aDNA software list so far (at least to our knowledge).

joeroe commented 3 years ago

Not to speak for @zackbatist, but I added that tag and most of these entries and afterwards we did talk a bit about where to draw the line. As you say, including all aDNA tools would be too much, but the ones more directly related to archaeology could be included. The current list probably needs some pruning to actually stick to that logic.

MartinHinz commented 3 years ago

@joeroe , agreed. Since the selection will always be subjective, and both fields are intertwined anyway, I would include everything that the owner of the repo has the feeling that it is really immediatly helpful for archaeologists rather than for paleogeneticists?

zackbatist commented 3 years ago

This list is always going to have an arbitrary scope, the best we can do is be clear in our decisions. Personally, I prefer to keep a broad scope. And as @MartinHinz pointed out it's also a matter of the maintainers of each project identifying the item as related to archaeology. Maintainers identifying their repos as relating to archaeology not only serves to delimit what should or should not be included, but also frames the potential for items to be discovered and included in open-archaeo at all.

To illustrate this point, the main discovery mechanism I've used is identifying a repo I don't recognize (typically by happenstance) and checking out the profiles and all other repos created by its contributors, and repeat this until I've gone through the entire cluster -- basically a manual crawl. If I don't see any key words that identify it as relating to archaeology (archaeo, palaeo, etc) at all during a cursory scan, then I tend to move on. Another way new entries are added is through maintainers getting in touch directly or submitting a PR to include their software.

All that being said, perhaps we should identify a few areas like archaeogenetics, palaeoenvironment reconstruction, XRF/elemental analysis, aerial scanning, tablet recording interfaces, etc that might benefit from a deep dive, in order to ensure that we have included all existing relevant tools, and perhaps also designate categories for those that are tangential to our main scope.

I generally need to do a better job curating tags and categories and delimiting their scopes. @joeroe and I have discussed documenting each tag more thoroughly to help make this a more useful dataset, but my progress has been halted by other work that has cropped up.

jfy133 commented 3 years ago

Just to jump in (as I had said privately to @nevrome ) - if you do decide to go the 'broad' route - I am happy to help with a 'brain dump' of aDNA tools.

My background is actually in archaeology (B.Sc.), so I also have a rough feeling of what is specifically archaeo/palaeogenetics vs. just genetics (or aDNA of ecology etc).

Let me know if you're interested.

nevrome commented 3 years ago

I could very well imagine to join @jfy133 in providing a software list at the interface of archaeology and archaeogenetics. I hope our poseidon project will soon be ready to fit exactly into this gap :wink:

Beyond that it might be a fun task for the SIG to cover these specific domains you listed, Zack. We could split up into small teams to do some of these deep dives.

zackbatist commented 3 years ago

That would be really helpful! It would also be really helpful if you could identify what is meant by the term archaeo/palaeogenetics, as wel as other related terms. I must admit that I have very little experience with that/those fields, so getting some input with people who are actually more familiar with it would be fantastic.

jfy133 commented 3 years ago

Ooof, good question ;).

I would define archaeogenetics being the analysis of degraded DNA from archaeological/anthropological sources (skeletons of humans/domesticate animal/plants, artefacts, hominin DNA from cave sediments etc.). Palaeogenetics could be considered more broad and could include less archaeology related stuff such as degraded DNA from sediment to address ecological questions etc.

Generally most archaeogenetic tools will be related specifically for accounting for the degraded DNA part.

jfy133 commented 3 years ago

A little brain dump of software while it was relevent for some thing else (and later why it's specific to aDNA)


Damage profilers

Metagenomic classifers

aDNA Simulators

aDNA Damage-aware Genotypers

Low-coverage contamination estimation

Low-coverage adapted population-genetics tools

nevrome commented 3 years ago

Can't add much to that!

jfy133 commented 3 years ago

Yeah I couldn't find contamMix either, seems to be some random script floating around colleague networks.

I originally had put angsd in the list, but when I was putting it into nf-core/eager, I noticed it seemed to actually be a mostly pop-gen tool, with only a couple of functions adapted for low coverage thrown in. So I'm not sure if that should be included or not. On the otherhand those functions are very commonly used...

jfy133 commented 3 years ago

Note to self, a few more:

TCLamnidis commented 3 years ago

Nothing to add currently, but just confirming: To my knowledge, ContamMix is available by request from the author and has no homepage indeed.

jfy133 commented 3 years ago

I've added links to my weekend brain dump.

I would remove contamMix for that reason. It's not open per-se.

jfy133 commented 3 years ago

A few more:

jfy133 commented 3 years ago

@zackbatist would you be against me tweeting for suggestions? Can ask the community if we're missing anything...

jfy133 commented 3 years ago

To make things easier for @zackbatist :

Adds all the above, and corrects some things. Please check everything. In particular I wasn't sure how multiple authors work or should be displayed. E.g. for nf-core pipelines is from the organisation, but I know who the authors are, however in some cases this coule be a lot..

zackbatist commented 3 years ago

Thanks @jfy133 ! I updated #22, see my comment there for further action required. Feel free to share this, we can use as much domain expertise as we can get!

Authorship is still a bit of a mess, partly due to restrictions imposed by the spreadsheet format, and partly due to the different affordances for collaboration in open source software development than what researchers may be used to. Currently I'm just going with the repository's maintainer plus any contributors who are explicitly referred to in the readme or in other documentation that is easy to come across. Please add any individual authors, as you see fit.

altinisik commented 3 years ago

Great list! Thanks for the effort.

ContamLD for Low-coverage contamination estimation

jfy133 commented 3 years ago

Self Note: the thing from Becky Cribdon/Allaby lab that I'm blanking on the name at the moment <- actually no, it's not aDNA specific (just applied to it)

jfy133 commented 3 years ago

@zackbatist @nevrome I think we're done for now, so this issue could be closed. However, I notice that the website itself hasn't updated.

Note that in my PRs I only updated the CSV and no other files, so depending where the website draws from maybe a new PR needs to be made (unless you only update the website periodicially)

zackbatist commented 3 years ago

Just updated the site manually, thanks for the reminder.

bbartholdy commented 3 years ago

@jfy133, is there a reason you left out cuperdec?

jfy133 commented 3 years ago

@jfy133, is there a reason you left out cuperdec?

'Cauuussseeee I forgot 😬

bbartholdy commented 3 years ago

@jfy133, is there a reason you left out cuperdec?

'Cauuussseeee I forgot grimacing

I didn't realise it's on cran now, congrats!

zackbatist commented 3 years ago

Thanks, just merged this PR in.