orasanen / ALICE

Automatic LInguistic Unit Count Estimator (ALICE)
44 stars 11 forks source link

Error message regarding wave file form #19

Closed rangpark92 closed 3 years ago

rangpark92 commented 3 years ago

Hello,

I have a question regarding the wave file format. When I ran audio files collected through LENA devices, there wasn't any problems related to wave file format. However, when I tried to run audio files collected through other USB recorders (recorder 1 and recorder 2), I was not able to get any outputs and got the error message below. Is there any specific file format requirements for using ALICE?

ValueError: Unknown wave file format getFinalEstimates.py:10: UserWarning: genfromtxt: Empty input file: "/Users/yerangpark/ALICE/tmp_data/features/ALUCs_out_individual.txt" F = genfromtxt(curdir + "/tmp_data/features/ALUCs_out_individual.txt", delimiter='\t') ALICE completed. Results written to /Users/yerangpark/ALICE/ALICE_output.txt and /Users/yerangpark/ALICE/diarization_output.rttm.)

orasanen commented 3 years ago

Hi,

This is curious! Is there any chance you could send me (by email; see ALICE readme) a short audio sample from one of the recorders, so I could try to reproduce the error? It can also be a clip without any speech in it, in case you have privacy limitations with your dataset.

BR, Okko

rangpark92 commented 3 years ago

On Wed, Apr 7, 2021 at 3:49 PM Ye Rang Park PhD @.***> wrote:

Hi Okko,

I've attached the audio files that I tested (both of them were just recorded by me - so no privacy limitations :)). It would be really helpful if you can try running these files to see if the same error message appears.

Thanks so much, Ye Rang

On Tue, Apr 6, 2021 at 10:21 PM orasanen @.***> wrote:

Hi,

This is curious! Is there any chance you could send me (by email; see ALICE readme) a short audio sample from one of the recorders, so I could try to reproduce the error? It can also be a clip without any speech in it, in case you have privacy limitations with your dataset.

BR, Okko

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/orasanen/ALICE/issues/19#issuecomment-814609371, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATFTTZLGC6PEWDZSOKIDZ2TTHPTUTANCNFSM42PICIKA .

--

Ye Rang Park, PhD | Postdoctoral Researcher | Psychology and Economics of Poverty

Center for Effective Global Action | UC Berkeley

--

Ye Rang Park, PhD | Postdoctoral Researcher | Psychology and Economics of Poverty

Center for Effective Global Action | UC Berkeley

rangpark92 commented 3 years ago

I'm sorry, it looks like one of the files is too large - would you mind testing the first file named "FILE0002.wav"? Thank you!

On Wed, Apr 7, 2021 at 3:53 PM Ye Rang Park PhD @.***> wrote:

Hi Okko,

I'm sending you the audio files that I tested (both of them were just recorded by me - so no privacy limitations :)). It would be really helpful if you can try running these files to see if the same error message appears. I'm sending two files separately because I got an email that message is too large to send. Hopefully you received both files?

Thanks so much, Ye Rang

On Wed, Apr 7, 2021 at 3:50 PM Ye Rang Park PhD @.***> wrote:

On Wed, Apr 7, 2021 at 3:49 PM Ye Rang Park PhD @.***> wrote:

Hi Okko,

I've attached the audio files that I tested (both of them were just recorded by me - so no privacy limitations :)). It would be really helpful if you can try running these files to see if the same error message appears.

Thanks so much, Ye Rang

On Tue, Apr 6, 2021 at 10:21 PM orasanen @.***> wrote:

Hi,

This is curious! Is there any chance you could send me (by email; see ALICE readme) a short audio sample from one of the recorders, so I could try to reproduce the error? It can also be a clip without any speech in it, in case you have privacy limitations with your dataset.

BR, Okko

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/orasanen/ALICE/issues/19#issuecomment-814609371, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATFTTZLGC6PEWDZSOKIDZ2TTHPTUTANCNFSM42PICIKA .

--

Ye Rang Park, PhD | Postdoctoral Researcher | Psychology and Economics of Poverty

Center for Effective Global Action | UC Berkeley

--

Ye Rang Park, PhD | Postdoctoral Researcher | Psychology and Economics of Poverty

Center for Effective Global Action | UC Berkeley

--

Ye Rang Park, PhD | Postdoctoral Researcher | Psychology and Economics of Poverty

Center for Effective Global Action | UC Berkeley

--

Ye Rang Park, PhD | Postdoctoral Researcher | Psychology and Economics of Poverty

Center for Effective Global Action | UC Berkeley

orasanen commented 3 years ago

I'm sorry, but I'm not sure where's the file you mentioned? You can send it to okko.rasanen@tuni.fi, thanks!

orasanen commented 3 years ago

Thanks for the files! The culprit seemed to be that the files are DVI ADPCM format, which is a compressed audio format. Scipy wavread didn't manage to read that (neither does, e.g., Praat). I now fixed the problem by switching from scipy to Librosa -based audio file reading. Both of your files seem to work now.

It is still possible that there are some other compressed audio formats that Librosa may not be able to read (not really familiar with all the compression formats used nowadays by different devices, nor which of the Librosa supports).

You can now pull the updated version and process your data. Let me know if the problem persists. Also, thanks for bringing this issue to our attention!

rangpark92 commented 3 years ago

Hi Okko,

Thank you so much! It works on my end too, happy to help in any way I can :)

On Thu, Apr 8, 2021 at 10:57 PM orasanen @.***> wrote:

Thanks for the files! The culprit seemed to be that the files are DVI ADPCM format, which is a compressed audio format. Scipy wavread didn't manage to read that (neither does, e.g., Praat). I now fixed the problem by switching from scipy to Librosa -based audio file reading. Both of your files seem to work now.

It is still possible that there are some other compressed audio formats that Librosa may not be able to read (not really familiar with all the compression formats used nowadays by different devices, nor which of the Librosa supports).

You can now pull the updated version and process your data. Let me know if the problem persists. Also, thanks for bringing this issue to our attention!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/orasanen/ALICE/issues/19#issuecomment-816429731, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATFTTZLLVZVS26ZHAMQJEK3TH2JLTANCNFSM42PICIKA .

--

Ye Rang Park, PhD | Postdoctoral Researcher | Psychology and Economics of Poverty

Center for Effective Global Action | UC Berkeley