Closed ljo closed 9 months ago
I've cloned @ljo's branch and the file names look fine.
I'm worried about the merge causing conflicts etc. I did a test with a dummy repo and changed file names of three files file-one
--> file_1
using git mv
. When I merged with the master branch, only 1 of the three was recognized as 'renamed'. The other two were marked as "deleted by them"/"new file".
Some googling gets me a stackoverflow answer "Git will automatically detect the move/rename if your modification is not too severe. ... 'not too severe' means that the new file and old file are >50% 'similar' based on some similarity indexes that git uses" (my emphasis). Probably padding numbers in our file names is similar enough, but it's not completely transparent what's going on under the hood.
To avoid potential issues with other ongoing work, I suggest we merge this only when there are no branches with ongoing work involving protocol files -- i.e. when all other branches are merged into dev.
I think this souns like a plan. Ie take this as the last PR. It might mean som additional work for @ljo. Are you ok with this?
I think this souns like a plan. Ie take this as the last PR. It might mean som additional work for @ljo. Are you ok with this?
Yes, with the amendment of the decision on today's meeting.
I was a little premature in the meeting today. I want to raise a couple points for discussion on this issue:
höst
from fall sessions' file names. I don't think we should do that. (Getting the ö
out of the file name is good though). I looked for other instances of these 'specifiers' getting removed from file names, but I didn't see others -- did you take out any thing else from file names?If we can decide on höst
and get those 1892/1905 urtima protocols zero padded, then I think this can be merged today. All my stuff touching protocols has been merged.
I was a little premature in the meeting today. I want to raise a couple points for discussion on this issue:
* @ljo, you removed `höst` from fall sessions' file names. I don't think we should do that. (Getting the `ö` out of the file name is good though). I looked for other instances of these 'specifiers' getting removed from file names, but I didn't see others -- did you take out any thing else from file names?
No I only removed höst
since that was the only one of these specifiers which were in the general sequence, all other had their own sequences.
* 1892, 1905: zero padding didn't take effect on urtima and urtima2 sessions.
Fixing now
I just wonder if that can be relevant info for the mandate periods of the MPs. I had been using these specifiers categorically to determine the 'standard' start/end of parliament sessions. @fredrik1984 @MansMeg, what do you say? I could pull this info from elsewhere if we really want höst out of the filenames.
@BobBorges I am not sure what you mean here?
Preferably, I also would like urtima2
out of these specifiers since it looks like the other years with more than 1 urtima
are in the same sequence still. But I did not change this now. For höst
we have the robustness perspective as well which talks in favour of its removal.
@fredrik1984 I just mean that I treated it the same way as urtima -- that höst sessions have their own start and end dates.
@ljo. I guess so long as we don't get rid of this info completely it can be removed from the file names. Right now it's still in the TEI/text/front/div/head element and pb facs attrib....
Yes – höst/vår/lagtima/urtuma/a/b riksdag meeting should be treated in the same way.
For mandate periods of MPs, govs, etcetera I think the metadata should be used. Currently, different types of specifiers are used in the filenames with some small variations. Yes, the identifiers in the documents are still the same. I only want the filenames changed, for the previously stated reasons and on getting a) robustness, b) clarity, and very minor c) not getting out of date order of documents without looking at the dates inside (which requires parsing).
Yes – höst/vår/lagtima/urtuma/a/b riksdag meeting should be treated in the same way.
@fredrik1984 Could you please elaborate a bit on this?
We should add start/end dates for höst/vår/lagtima/urtuma/a/b riksdag meetings using Lottas curated list (attached here).
I don't have a strong feeling about the filenames, but I have been using the file names as a convenient way to fetch these periods -- I think we should either keep all the specifiers or not (in this case we need some additional means to store the specifier data -- either in xml or in csv files)
We should add start/end dates for höst/vår/lagtima/urtuma/a/b riksdag meetings using Lottas curated list (attached here).
OK, yes, so the very first date I picked out of the list seems wrong though. 1967 vår
has start value 1967-06-10
but should be 1967-01-10
.
@MansMeg GH review seems impossible b/c the diffs for ∞ files won't load in the browser. Aside from the question about whether or not to keep höst
in the file names is not really answered clearly for me. If you're OK with that, please merge.
I've cloned @ljo's fork and looked at the changes locally -- changes are what we expect.
Okay, great! Then Ill wait for the tests to run and then merge.
prot-197677--.xml
andprot-197778--.xml