Training sets coding - Githubissues

ecm2021 commented 3 years ago

Hello, I just wanted to write and ask about the specifics of writing codes for training sets. For my dissertation, I’m hoping to analyse two corpuses for both within-culture and between-culture predictions. Just as in 12 and 13 of Pearce, M.T. (2018), I am wanting to use the LTM model only using 10-fold cross-validation for within-culture and then using this for the comparison culture. However (as I’m relatively new to emacs/lisp and I’m not seeing an example on the WIKI surrounding this), I am currently not sure how to go about coding side. For looking at within-culture predictions, I originally thought it was like this: (idyom:idyom 1 '(cpitch) '(cpitch) :models :ltm :k10)

And for between-cultures: (idiom:idiom 1 ‘(cpitch) ‘(cpitch) :select :models :ltm :k10 2)

If you could guide me how to correct my coding, that would be greatly appreciated! Thank you and all the best, Eden

Kappers commented 3 years ago

Hi Eden!

There are a few things worth noting here, please do let me know if anything isn't clear.

(1) For the between-cultures analysis, I think you want to use pretraining and not cross-validation -- cross-validation would result in extra training (within each fold) on the cultural corpus being analysed. The paper you cited did not claim to use cross-validation for the between-cultures analysis.

Therefore, you will need to do two things for the between-cultures model:

Specify a pretraining dataset, corresponding to the cultural corpus you wish to train your model on, using e.g. :pretraining-ids 2.
Set :k 1 to ensure there is no training other than the pretraining corpus.

e.g. (idyom:idyom 1 '(cpitch) '(cpitch) :models :ltm :k 1 :pretraining-ids 2)

(2) Also, for the between-cultures model, you appear to specify both the target and base viewpoints '(cpitch) '(cpitch), then also attempt to enable viewpoint selection with :select.

If you wish to use automatic viewpoint selection, I would advise you just specify your base viewpoints and :select with the respective parameters as described in the wiki here, i.e. do include any source viewpoint.

e.g. (idyom:idyom 1 '(cpitch) :select :basis :pitch-short :models :ltm :k 1 :pretraining-ids 2)

Warning - when using :pretraining-ids, I am not sure if viewpoint selection optimizes the source viewpoints for the pretraining dataset, analysis dataset or both. @mtpearce might be able to clarify this.

Given that you do not specify :select for your within-cultures model, maybe you do not intend to perform viewpoint selection at all?

For now, I hope that helps! Tom

ecm2021 commented 3 years ago

Hi Tom,

Thank you for your help! Unfortunately, when inputting what you suggested, I keep getting an error - can you see where I've gone wrong?

All the best, Eden

Kappers commented 3 years ago

Hi Eden,

Apologies this was my mistake, the :pretraining-ids parameter expects a list of arguments, hence the type warning.

Passing the dataset id like this should fix things: :pretraining-ids '(2).

Tom

ecm2021 commented 3 years ago

Hi Tom,

Ah, amazing! That's working now, thank you so much!

And just to clarify, when using the long-term model for within-culture analysis, would (idyom:idyom 1 '(onset) '(ioi-ratio) :models :ltm :k 1 :detail 2) suffice, or should I also set the pretraining-id (e.g (idyom:idyom 1 '(onset) '(ioi-ratio) :models :ltm :k 1 :pretraining-ids '(1) :detail 2)? Sorry if this sounds like an incredibly obvious question!

All the best, Eden

Kappers commented 3 years ago

when using the long-term model for within-culture analysis, would (idyom:idyom 1 '(onset) '(ioi-ratio) :models :ltm :k 1 :detail 2) suffice, or should I also set the pretraining-id (e.g (idyom:idyom 1 '(onset) '(ioi-ratio) :models :ltm :k 1 :pretraining-ids '(1) :detail 2)? Sorry if this sounds like an incredibly obvious question!

For within-culture analysis, you will be training and testing on the same dataset, and therefore pretraining (on some other dataset) is not necessary. You can create training/test sets from the dataset through cross-validation, using :k 10 (the default, or some other number exceeding 1) -- this is consistent with Pearce (2018), as you previously pointed out.

So, no need for the pretraining-ids argument, but make sure to use a value of :k above 1, otherwise the LTMs will not be trained at all! This is described on the wiki, I advise you look at it carefully: https://github.com/mtpearce/idyom/wiki/IDyOM-Parameters#training-parameters

I hope that helps? Please do feel free to close the issue if you think it is resolved 👍

Tom

mtpearce commented 3 years ago

Warning - when using :pretraining-ids, I am not sure if viewpoint selection optimizes the source viewpoints for the pretraining dataset, analysis dataset or both. @mtpearce might be able to clarify this.

Just confirming for the record that viewpoint selection optimises viewpoints for models trained on both the pretraining and target datasets.

I'm closing this issue now.

mtpearce / idyom

Training sets coding #43