Closed aubertc closed 6 months ago
After testing some, my hypothesis is not valid. The program may simply assume that the title comes after the author? Really not sure what's happening.
The problem is seen through the
with the author 'Data Integration for the Study of Outstanding Productivity in Biomedical Research'
in the output above.
For this problem, I am not sure why it is skipping the first sheet within the test_input, but I do know why the output is the way it is. The program does not assume the title column comes after the author, as it has to specifically search for the title column for each sheet.
The output comes from whatever is in the author/researcher column. However, I cannot make sense of why it chose 'Data Integration for the Study of Outstanding Productivity in Biomedical Research'. The way I remember the program choosing the first element in the data gathered from each excel sheet.
This can probably easily be fixed by adding a way to track the index of the author column and title columns separately so this mistake is not made. I overlooked this possibility because of the simplicity of the tests we made.
Ok, thanks. Can you try to implement that?
This commit (https://github.com/popbr/abstract_doi_finder/commit/d9fa1be60e19993c4eac0917cd543ae1ef43f66d) should fix the issue we were having. Additionally, it should fix some of the other issues that we were having.
Please take a look when you have the time.
Consider the following sheet:
test_input.xlsx
the output is
as we can see, the program believes that the author is the title starting with the second sheet. My guess is that the program assumes the structure of the cells to be fixed, that is, if the author was in column X in the first sheet, then it will always be in column X in the next sheet(s).