pratyushasharma / sw-combinatoriality

Dataset and Codebase for paper on "Contextual and Combinatorial Structure in Sperm Whale Vocalisations"
21 stars 6 forks source link

Availability of acoustic recordings #1

Open abfleishman opened 4 months ago

abfleishman commented 4 months ago

Are the acoustic recordings associated with this paper available somewhere? Reading the data availability statement in the paper I expected to find them in this GitHub repository.

khughitt commented 2 months ago

@abfleishman I believe the dataset used is https://ieee-dataport.org/documents/dominica-dataset#files

abfleishman commented 2 months ago

@khughitt interesting! It looks like I have to pay to access it? does that sound correct?

khughitt commented 2 months ago

@abfleishman It appears so. The data itself is open and has a CC license, which is good. I reached out to the listed author to see if it is available elsewhere, and if not, if they would consider uploading it to Zenodo or another open data repository.

anmoisio commented 2 months ago

That dataset contains only echolocation click recordings, no codas. That's not the dataset used in this paper.

khughitt commented 2 months ago

@anmoisio The question was about the source of the original recordings, which I believe is what is hosted on the IEEE link I shared, although I could be wrong. You are correct that the paper does not start from the raw data, but instead builds on a couple of different processed coda tabular data files which are hosted in the repo at https://github.com/pratyushasharma/sw-combinatoriality/tree/main/data.

anmoisio commented 2 months ago

If I've understood correctly, codas are the sequences of clicks these whales use for communicating, while echolocation clicks are different. The other details don't match either: in the paper they say the data is from years between 2005 and 2018, and the linked dataset is collected in September 2023.

khughitt commented 2 months ago

@anmoisio Ah, good catch! You are correct! The dates don't line up and the dataset does appear to be focused on echolocation clicks and not codas. The author of the IEEE dataset discusses both of these here: https://arxiv.org/pdf/2401.00900.

In that case, if the original audio recordings are publicly available, I'm not sure where. I couldn't find anything else online or via the Dominica Sperm Whale Project website.

anmoisio commented 2 months ago

@khughitt I couldn't find the correct dataset either, unfortunately