The EDA is almost finished. I can retreive images without alt text and html without lang tags. These are going to be my main focus points when repairing.
I chose not to use Robin’s dataset since it has no mistakes in it (probably since it is a paid dataset). I am now making my own dataset using libgen and a github plugin that one of the group members send me.
To do:
Today I am going to finish my EDA (mostly fixing the document structure). I will put this in my milestone folder when it is done.
Starting to fix the epubs, based on earlier preparations, is next.
Remarks:
The coming week might be less productive due to exams and deadlines. After that, full attention will go back to the thesis work.
@maartenmarx Recap:
The EDA is almost finished. I can retreive images without alt text and html without lang tags. These are going to be my main focus points when repairing.
I chose not to use Robin’s dataset since it has no mistakes in it (probably since it is a paid dataset). I am now making my own dataset using libgen and a github plugin that one of the group members send me.
To do:
Today I am going to finish my EDA (mostly fixing the document structure). I will put this in my milestone folder when it is done.
Starting to fix the epubs, based on earlier preparations, is next.
Remarks:
The coming week might be less productive due to exams and deadlines. After that, full attention will go back to the thesis work.