Inquiry on convert_apc_set_to_atepc_set method and a few questions.

kerolzeeq commented 2 years ago

Hello @yangheng95 ,

Thanks for making and maintaining this repo, it's great!

I'm using pyabsa-1.16.14 currently and I've running into some problems with the convert_apc_set_to_atepc_set method. Whereby this situation occur with the BIO tagging :

Outside term (O) is labelled with O 999 instead of O -999

Do you have any idea why this is happening? Might there be any changes to the code?

Next I have a few questions if you don't mind :

What is the best way to split a labelled APC dataset into training set, validation set and test set? Is there some kind of exact way for splitting/sampling in ABSA?
I'm fairly new to the concept of Transformers, I'm curious how does checkpoints work? and how to load the checkpoints when I want to test the model on a new sentence?

Thanks a lot!

yangheng95 commented 2 years ago

It is a bug which should be fixed in 1.16.15. About the usages of checkpoints, please go to the demo folder to view.

kerolzeeq commented 2 years ago

I see, thanks for the clarification on the bug and checkpoints @yangheng95

I think you missed one of my question:

What is the best way to split a labelled APC dataset into training set, validation set and test set? Is there some kind of exact way for splitting/sampling in ABSA?

I do have another different question, is there a way that I could retrieve the list of loss and accuracy per epoch during training?

yangheng95 commented 2 years ago

There is no best way to split dataset, usually split the training, testing, validation set datasets in 8:1:1 or 7:2:1 And you can modify the code to obtain the list of losses and accuracy. e.g., in APC trainer, you can get losses as https://github.com/yangheng95/PyABSA/blob/e257c5cea062de57eab4a71ed732043f58bf73e9/pyabsa/core/apc/training/apc_trainer.py#L228

kerolzeeq commented 2 years ago

Thank you! 😀

yangheng95 / PyABSA

Inquiry on convert_apc_set_to_atepc_set method and a few questions. #220