paul-shannon / slexil

Software Linking Elan Xml to Illuminated Language
MIT License
0 stars 1 forks source link

calling AudioExtractor from Text object instead of webapp.py #38

Closed davidjamesbeck closed 5 years ago

davidjamesbeck commented 5 years ago

I'm starting this issue mainly with some thoughts here. I don't know if you still want to take this on (you're more than welcome), but I wonder if this isn't an opportunity to think a bit more about user workflow.

Currently, the issue is that webapp.py needs the user to select and parse a sound file before proceeding to Create Text, which is where the constructor for the Text object lives. That means we have to construct the AudioExtractor object first, which has become a problem since Text needs a startStopTable so it knows the time codes associated with each of the clips AudiExtractor makes. At the moment, webapp stores these codes and passes them to as an argument to Text(), but Text() already has a pretty complex signature, so it might be worth trying to fix that. It also makes more sense to me to have Text directly associated with an AudioExtractor object since that means a Text object encapsulates all the information about the project in a single object.

So, I was thinking a good option might be changing the work flow so that the recording is validated and parsed at the same time as the Text object is created (that is, the constructor for AudioExtractor is called in the init for Text)? Errors from AudioExtractor could be passed to the same TextArea that reports the errors from building texts and the steps leading up to Create Text could be streamlined to just identifying the sound file (but not parsing it yet). This makes things a bit more user friendly, at least when all goes well.

Thoughts?

David

paul-shannon commented 5 years ago

Text needs a startStopTable

Hi David,

Not sure I understand this claim! Here’s why. Do set me straight.

The constructor for a Text object takes an xml (eaf) filename as its first argument. The per-line start and stop times are in that file.

To my way of thinking, therefore, Text objects can obtain all the timing information with some straightforward xml parsing.

davidjamesbeck commented 5 years ago

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

paul-shannon commented 5 years ago

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

On Jun 28, 2019, at 10:49 AM, David Beck notifications@github.com wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

davidjamesbeck commented 5 years ago

Hm. I will look at IJALine and see how it is handling start/stop times. Text needs to know these because the automated playback needs a lookup table to track what line is being played at any given point in time, but it is probably possible to build that incrementally as each IJALline object is created (the values for a given line could be passed back to Text) and then have that put into the HTML output in some way once all the lines are constructed (though this might be a problem because it would have to be early in the file, before all the lines are built).

Still seems like more work than fixing webapp.py, though. I’ll try to find some time tomorrow or Sunday to evaluate this and maybe write a test based on some modified versions of Text and IJALline to see how it goes.

David

On Jun 28, 2019, at 12:50 PM, Paul Shannon notifications@github.com wrote:

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

  • every text consists of some number of IjalLines
  • every IjalLine parses its start and stop times from its xml element
  • Only Ijal lines should know about start and stop times - I’m pretty sure of this
  • If the Text object needs start and stop times, it can get them from the IjalLines it holds

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

  • Paul

On Jun 28, 2019, at 10:49 AM, David Beck notifications@github.com wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBVUVW5FJKOJXNUVFNDP4ZMP3A5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY24OYY#issuecomment-506840931, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSWRPCDZF4XG6WOQHTP4ZMP3ANCNFSM4H4G4BAA.

davidjamesbeck commented 5 years ago

Hi, Paul

I couldn’t find anything in IJALline.py that tracks time codes or makes any use of them, so instead I'm working on getting the table I need in Text() by importing and calling the determineStartAndEndTimes() method from AudioExtractor. This seems to work and all the unit tests now pass with the smaller signature for the Text() constructor except for test_text_inferno.py. This is one of the files you marked up with

<<<<<HEAD

===========

master

notation. I don’t recognize that, but whatever it is it doesn’t play nice with my Python interpreter (or Xcode). I found some other places that had that and just commented it out, but in this particular file I can’t seem to do that in a way that lets the test pass.

Anyway, I’ve got all this on a new branch and I’ll put in a pull request so you can review it once I’ve adjusted webapp.py to make sure that the whole thing works properly.

Cheers,

David

On Jun 28, 2019, at 1:06 PM, David J Beck dbeck@ualberta.ca wrote:

Hm. I will look at IJALine and see how it is handling start/stop times. Text needs to know these because the automated playback needs a lookup table to track what line is being played at any given point in time, but it is probably possible to build that incrementally as each IJALline object is created (the values for a given line could be passed back to Text) and then have that put into the HTML output in some way once all the lines are constructed (though this might be a problem because it would have to be early in the file, before all the lines are built).

Still seems like more work than fixing webapp.py, though. I’ll try to find some time tomorrow or Sunday to evaluate this and maybe write a test based on some modified versions of Text and IJALline to see how it goes.

David

On Jun 28, 2019, at 12:50 PM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

  • every text consists of some number of IjalLines
  • every IjalLine parses its start and stop times from its xml element
  • Only Ijal lines should know about start and stop times - I’m pretty sure of this
  • If the Text object needs start and stop times, it can get them from the IjalLines it holds

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

  • Paul

On Jun 28, 2019, at 10:49 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBVUVW5FJKOJXNUVFNDP4ZMP3A5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY24OYY#issuecomment-506840931, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSWRPCDZF4XG6WOQHTP4ZMP3ANCNFSM4H4G4BAA.

paul-shannon commented 5 years ago

Hi David,

<<<<<HEAD and etc are inserted by git when it detects differences between your changes to a file and what’s in the repo. Conflicting parts are marked off by these inserted lines. The standard response is to study the two sections, choose the lines you want to keep, remove the markup.

I think you will find start and end times extracted in ijalLine.py starting at line 304 in the buildTable method. They are returned to the caller in the table created in that method, and stored in the ijalLine object.

The times are fundamental to the ijalLine class - and no other object in the code needs to keep track of them.

It’s a bad idea for any other object to track them separately!

webapp.py has a Text object. A Text object has ijalLines. Each ijalLine has a start and stop time. If any other bit of code needs to know those times, they should query each individual ijalLine object for that information.

The essential programming principle is DRY: do not repeat yourself. If you DO repeat yourself, maintenance and future development of the code - and debugging when problems appear, as they always ill - become increasingly nasty processes.

On Jun 29, 2019, at 6:38 AM, David Beck notifications@github.com wrote:

Hi, Paul

I couldn’t find anything in IJALline.py that tracks time codes or makes any use of them, so instead I'm working on getting the table I need in Text() by importing and calling the determineStartAndEndTimes() method from AudioExtractor. This seems to work and all the unit tests now pass with the smaller signature for the Text() constructor except for test_text_inferno.py. This is one of the files you marked up with

<<<<<HEAD

===========

master

notation. I don’t recognize that, but whatever it is it doesn’t play nice with my Python interpreter (or Xcode). I found some other places that had that and just commented it out, but in this particular file I can’t seem to do that in a way that lets the test pass.

Anyway, I’ve got all this on a new branch and I’ll put in a pull request so you can review it once I’ve adjusted webapp.py to make sure that the whole thing works properly.

Cheers,

David

On Jun 28, 2019, at 1:06 PM, David J Beck dbeck@ualberta.ca wrote:

Hm. I will look at IJALine and see how it is handling start/stop times. Text needs to know these because the automated playback needs a lookup table to track what line is being played at any given point in time, but it is probably possible to build that incrementally as each IJALline object is created (the values for a given line could be passed back to Text) and then have that put into the HTML output in some way once all the lines are constructed (though this might be a problem because it would have to be early in the file, before all the lines are built).

Still seems like more work than fixing webapp.py, though. I’ll try to find some time tomorrow or Sunday to evaluate this and maybe write a test based on some modified versions of Text and IJALline to see how it goes.

David

On Jun 28, 2019, at 12:50 PM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

  • every text consists of some number of IjalLines
  • every IjalLine parses its start and stop times from its xml element
  • Only Ijal lines should know about start and stop times - I’m pretty sure of this
  • If the Text object needs start and stop times, it can get them from the IjalLines it holds

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

  • Paul

On Jun 28, 2019, at 10:49 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBVUVW5FJKOJXNUVFNDP4ZMP3A5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY24OYY#issuecomment-506840931, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSWRPCDZF4XG6WOQHTP4ZMP3ANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

davidjamesbeck commented 5 years ago

Hi, Paul

Hmm. Maybe it is because I don’t understand Pandas, but when I ask IjalLine to print out the data frame created by buildTable(), I get

[4 rows x 13 columns] ANNOTATION_ID LINGUISTIC_TYPE_REF ... HAS_TABS HAS_SPACES 0 a36 default-lt ... False True 1 a51 phonemic ... True False 2 a56 translation ... False True 3 a194 translation ... True False

I don’t see the timecodes here, but I do see that there are 13 columns somehow even though print() only shows 6. If you can tell me how to extract the start and stop times, I can write a getStartStop() method for IjalLine and have Text call it as it builds the HTML.

BTW, AudioExtractor also tracks time codes, so there is repetition no matter which way we go with this.

David

On Jun 29, 2019, at 8:17 AM, Paul Shannon notifications@github.com wrote:

Hi David,

<<<<<HEAD and etc are inserted by git when it detects differences between your changes to a file and what’s in the repo. Conflicting parts are marked off by these inserted lines. The standard response is to study the two sections, choose the lines you want to keep, remove the markup.

I think you will find start and end times extracted in ijalLine.py starting at line 304 in the buildTable method. They are returned to the caller in the table created in that method, and stored in the ijalLine object.

The times are fundamental to the ijalLine class - and no other object in the code needs to keep track of them.

It’s a bad idea for any other object to track them separately!

webapp.py has a Text object. A Text object has ijalLines. Each ijalLine has a start and stop time. If any other bit of code needs to know those times, they should query each individual ijalLine object for that information.

The essential programming principle is DRY: do not repeat yourself. If you DO repeat yourself, maintenance and future development of the code - and debugging when problems appear, as they always ill - become increasingly nasty processes.

  • Paul

On Jun 29, 2019, at 6:38 AM, David Beck notifications@github.com wrote:

Hi, Paul

I couldn’t find anything in IJALline.py that tracks time codes or makes any use of them, so instead I'm working on getting the table I need in Text() by importing and calling the determineStartAndEndTimes() method from AudioExtractor. This seems to work and all the unit tests now pass with the smaller signature for the Text() constructor except for test_text_inferno.py. This is one of the files you marked up with

<<<<<HEAD

===========

master

notation. I don’t recognize that, but whatever it is it doesn’t play nice with my Python interpreter (or Xcode). I found some other places that had that and just commented it out, but in this particular file I can’t seem to do that in a way that lets the test pass.

Anyway, I’ve got all this on a new branch and I’ll put in a pull request so you can review it once I’ve adjusted webapp.py to make sure that the whole thing works properly.

Cheers,

David

On Jun 28, 2019, at 1:06 PM, David J Beck dbeck@ualberta.ca wrote:

Hm. I will look at IJALine and see how it is handling start/stop times. Text needs to know these because the automated playback needs a lookup table to track what line is being played at any given point in time, but it is probably possible to build that incrementally as each IJALline object is created (the values for a given line could be passed back to Text) and then have that put into the HTML output in some way once all the lines are constructed (though this might be a problem because it would have to be early in the file, before all the lines are built).

Still seems like more work than fixing webapp.py, though. I’ll try to find some time tomorrow or Sunday to evaluate this and maybe write a test based on some modified versions of Text and IJALline to see how it goes.

David

On Jun 28, 2019, at 12:50 PM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

  • every text consists of some number of IjalLines
  • every IjalLine parses its start and stop times from its xml element
  • Only Ijal lines should know about start and stop times - I’m pretty sure of this
  • If the Text object needs start and stop times, it can get them from the IjalLines it holds

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

  • Paul

On Jun 28, 2019, at 10:49 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBVUVW5FJKOJXNUVFNDP4ZMP3A5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY24OYY#issuecomment-506840931, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSWRPCDZF4XG6WOQHTP4ZMP3ANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBX3HWVMHWSIQ5FN4XLP45VGHA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3ZSEA#issuecomment-506960144, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBVXOBIEGMRRN5VHNV3P45VGHANCNFSM4H4G4BAA.

paul-shannon commented 5 years ago

Hi David,

Thanks for you patience with all my preaching!

That AudioExtractor.py knows time codes is okay, does not violate DRY, because it is not part of the Text class, or any of the classes of which it is composed. I wrote it only as a utility - whose only job is to

The motivation for all of this, you may recall, is that we want to support “laptop local” use of the html versions of the stories. This means we do not expect for a “byte range” webserver to be running, and so we have to use single-line audio, each played back in their entirety.

AudioExtractor is intended only to do this preparatory step: divide up files guided by the eaf.

Could you remind me what part of webapp.py needs to know start and end times? Maybe this is asking (sorry if you already explained this): what is inadequate about implementing “entire story playback” by successive playback of each line, one at a time?

On Jun 29, 2019, at 7:30 AM, David Beck notifications@github.com wrote:

Hi, Paul

Hmm. Maybe it is because I don’t understand Pandas, but when I ask IjalLine to print out the data frame created by buildTable(), I get

[4 rows x 13 columns] ANNOTATION_ID LINGUISTIC_TYPE_REF ... HAS_TABS HAS_SPACES 0 a36 default-lt ... False True 1 a51 phonemic ... True False 2 a56 translation ... False True 3 a194 translation ... True False

I don’t see the timecodes here, but I do see that there are 13 columns somehow even though print() only shows 6. If you can tell me how to extract the start and stop times, I can write a getStartStop() method for IjalLine and have Text call it as it builds the HTML.

BTW, AudioExtractor also tracks time codes, so there is repetition no matter which way we go with this.

David

On Jun 29, 2019, at 8:17 AM, Paul Shannon notifications@github.com wrote:

Hi David,

<<<<<HEAD and etc are inserted by git when it detects differences between your changes to a file and what’s in the repo. Conflicting parts are marked off by these inserted lines. The standard response is to study the two sections, choose the lines you want to keep, remove the markup.

I think you will find start and end times extracted in ijalLine.py starting at line 304 in the buildTable method. They are returned to the caller in the table created in that method, and stored in the ijalLine object.

The times are fundamental to the ijalLine class - and no other object in the code needs to keep track of them.

It’s a bad idea for any other object to track them separately!

webapp.py has a Text object. A Text object has ijalLines. Each ijalLine has a start and stop time. If any other bit of code needs to know those times, they should query each individual ijalLine object for that information.

The essential programming principle is DRY: do not repeat yourself. If you DO repeat yourself, maintenance and future development of the code - and debugging when problems appear, as they always ill - become increasingly nasty processes.

  • Paul

On Jun 29, 2019, at 6:38 AM, David Beck notifications@github.com wrote:

Hi, Paul

I couldn’t find anything in IJALline.py that tracks time codes or makes any use of them, so instead I'm working on getting the table I need in Text() by importing and calling the determineStartAndEndTimes() method from AudioExtractor. This seems to work and all the unit tests now pass with the smaller signature for the Text() constructor except for test_text_inferno.py. This is one of the files you marked up with

<<<<<HEAD

===========

master

notation. I don’t recognize that, but whatever it is it doesn’t play nice with my Python interpreter (or Xcode). I found some other places that had that and just commented it out, but in this particular file I can’t seem to do that in a way that lets the test pass.

Anyway, I’ve got all this on a new branch and I’ll put in a pull request so you can review it once I’ve adjusted webapp.py to make sure that the whole thing works properly.

Cheers,

David

On Jun 28, 2019, at 1:06 PM, David J Beck dbeck@ualberta.ca wrote:

Hm. I will look at IJALine and see how it is handling start/stop times. Text needs to know these because the automated playback needs a lookup table to track what line is being played at any given point in time, but it is probably possible to build that incrementally as each IJALline object is created (the values for a given line could be passed back to Text) and then have that put into the HTML output in some way once all the lines are constructed (though this might be a problem because it would have to be early in the file, before all the lines are built).

Still seems like more work than fixing webapp.py, though. I’ll try to find some time tomorrow or Sunday to evaluate this and maybe write a test based on some modified versions of Text and IJALline to see how it goes.

David

On Jun 28, 2019, at 12:50 PM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

  • every text consists of some number of IjalLines
  • every IjalLine parses its start and stop times from its xml element
  • Only Ijal lines should know about start and stop times - I’m pretty sure of this
  • If the Text object needs start and stop times, it can get them from the IjalLines it holds

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

  • Paul

On Jun 28, 2019, at 10:49 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBVUVW5FJKOJXNUVFNDP4ZMP3A5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY24OYY#issuecomment-506840931, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSWRPCDZF4XG6WOQHTP4ZMP3ANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBX3HWVMHWSIQ5FN4XLP45VGHA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3ZSEA#issuecomment-506960144, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBVXOBIEGMRRN5VHNV3P45VGHANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

davidjamesbeck commented 5 years ago

Hi, Paul

The start and end times are needed by the Javascript that controls animation during continuous playback. Webapp is fine without the start and stop times if Text can get them. On the Master branch currently, webapp.py asks AudioExtractor for a startStopTable in the extractAudio methods and passes it to Text in the createWebPage method, but I undid that in the startStopTable branch, which assumes that Text get the timecodes from its own methods rather than as an argument from createWebPage.

Playing the individual clips in sequence is inadequate (for a linguist) for several reasons. One is that it sounds jerky and unnatural, and (I suspect) things like loading time for clips will make it very uneven. It is also likely that many people will not parse the recording exhaustively in ELAN, so that the time codes that correspond to lines of text may leave gaps in the recording. Where there are pauses (or even asides!) in the story, the line-by-line playback will simply skip over those. Aside from erasing data (linguists are interested in pauses and asides), it will also sound unnatural to the trained ear. I also think community members will be sensitive to the program removing anything from the performance.

I’m not averse to using IjalLine to get times to Text, I just don’t know how to do it :-( . I know I’ll need to learn Pandas someday (it looks like I could use it for several other things I’m working on), but it has been lower priority than all the rest of the stuff I’m playing catch-up with, and opaque enough that I haven’t been able to fake it like I do with Javascript.

David

Could you remind me what part of webapp.py needs to know start and end times? Maybe this is asking (sorry if you already explained this): what is inadequate about implementing “entire story playback” by successive playback of each line, one at a time?

  • Paul

On Jun 29, 2019, at 7:30 AM, David Beck notifications@github.com wrote:

Hi, Paul

Hmm. Maybe it is because I don’t understand Pandas, but when I ask IjalLine to print out the data frame created by buildTable(), I get

[4 rows x 13 columns] ANNOTATION_ID LINGUISTIC_TYPE_REF ... HAS_TABS HAS_SPACES 0 a36 default-lt ... False True 1 a51 phonemic ... True False 2 a56 translation ... False True 3 a194 translation ... True False

I don’t see the timecodes here, but I do see that there are 13 columns somehow even though print() only shows 6. If you can tell me how to extract the start and stop times, I can write a getStartStop() method for IjalLine and have Text call it as it builds the HTML.

BTW, AudioExtractor also tracks time codes, so there is repetition no matter which way we go with this.

David

On Jun 29, 2019, at 8:17 AM, Paul Shannon notifications@github.com wrote:

Hi David,

<<<<<HEAD and etc are inserted by git when it detects differences between your changes to a file and what’s in the repo. Conflicting parts are marked off by these inserted lines. The standard response is to study the two sections, choose the lines you want to keep, remove the markup.

I think you will find start and end times extracted in ijalLine.py starting at line 304 in the buildTable method. They are returned to the caller in the table created in that method, and stored in the ijalLine object.

The times are fundamental to the ijalLine class - and no other object in the code needs to keep track of them.

It’s a bad idea for any other object to track them separately!

webapp.py has a Text object. A Text object has ijalLines. Each ijalLine has a start and stop time. If any other bit of code needs to know those times, they should query each individual ijalLine object for that information.

The essential programming principle is DRY: do not repeat yourself. If you DO repeat yourself, maintenance and future development of the code - and debugging when problems appear, as they always ill - become increasingly nasty processes.

  • Paul

On Jun 29, 2019, at 6:38 AM, David Beck notifications@github.com wrote:

Hi, Paul

I couldn’t find anything in IJALline.py that tracks time codes or makes any use of them, so instead I'm working on getting the table I need in Text() by importing and calling the determineStartAndEndTimes() method from AudioExtractor. This seems to work and all the unit tests now pass with the smaller signature for the Text() constructor except for test_text_inferno.py. This is one of the files you marked up with

<<<<<HEAD

===========

master

notation. I don’t recognize that, but whatever it is it doesn’t play nice with my Python interpreter (or Xcode). I found some other places that had that and just commented it out, but in this particular file I can’t seem to do that in a way that lets the test pass.

Anyway, I’ve got all this on a new branch and I’ll put in a pull request so you can review it once I’ve adjusted webapp.py to make sure that the whole thing works properly.

Cheers,

David

On Jun 28, 2019, at 1:06 PM, David J Beck dbeck@ualberta.ca wrote:

Hm. I will look at IJALine and see how it is handling start/stop times. Text needs to know these because the automated playback needs a lookup table to track what line is being played at any given point in time, but it is probably possible to build that incrementally as each IJALline object is created (the values for a given line could be passed back to Text) and then have that put into the HTML output in some way once all the lines are constructed (though this might be a problem because it would have to be early in the file, before all the lines are built).

Still seems like more work than fixing webapp.py, though. I’ll try to find some time tomorrow or Sunday to evaluate this and maybe write a test based on some modified versions of Text and IJALline to see how it goes.

David

On Jun 28, 2019, at 12:50 PM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

  • every text consists of some number of IjalLines
  • every IjalLine parses its start and stop times from its xml element
  • Only Ijal lines should know about start and stop times - I’m pretty sure of this
  • If the Text object needs start and stop times, it can get them from the IjalLines it holds

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

  • Paul

On Jun 28, 2019, at 10:49 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBVUVW5FJKOJXNUVFNDP4ZMP3A5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY24OYY#issuecomment-506840931, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSWRPCDZF4XG6WOQHTP4ZMP3ANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBX3HWVMHWSIQ5FN4XLP45VGHA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3ZSEA#issuecomment-506960144, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBVXOBIEGMRRN5VHNV3P45VGHANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBXWKQR6RQTDZGS4QLLP45YJVA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY32A3Q#issuecomment-506962030, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBQH3RRCGKUKX63TWB3P45YJVANCNFSM4H4G4BAA.

paul-shannon commented 5 years ago

I just committed minor changes to ijalLine.py with these new methods:

getStartTime() getEndtimme()

tested (and therefore demonstrated) in test_IjalLine.py, “test_getStartStopTimes()” on lokono line 3

Let me know if this does the trick.

On Jun 29, 2019, at 8:19 AM, David Beck notifications@github.com wrote:

Hi, Paul

The start and end times are needed by the Javascript that controls animation during continuous playback. Webapp is fine without the start and stop times if Text can get them. On the Master branch currently, webapp.py asks AudioExtractor for a startStopTable in the extractAudio methods and passes it to Text in the createWebPage method, but I undid that in the startStopTable branch, which assumes that Text get the timecodes from its own methods rather than as an argument from createWebPage.

Playing the individual clips in sequence is inadequate (for a linguist) for several reasons. One is that it sounds jerky and unnatural, and (I suspect) things like loading time for clips will make it very uneven. It is also likely that many people will not parse the recording exhaustively in ELAN, so that the time codes that correspond to lines of text may leave gaps in the recording. Where there are pauses (or even asides!) in the story, the line-by-line playback will simply skip over those. Aside from erasing data (linguists are interested in pauses and asides), it will also sound unnatural to the trained ear. I also think community members will be sensitive to the program removing anything from the performance.

I’m not averse to using IjalLine to get times to Text, I just don’t know how to do it :-( . I know I’ll need to learn Pandas someday (it looks like I could use it for several other things I’m working on), but it has been lower priority than all the rest of the stuff I’m playing catch-up with, and opaque enough that I haven’t been able to fake it like I do with Javascript.

David

Could you remind me what part of webapp.py needs to know start and end times? Maybe this is asking (sorry if you already explained this): what is inadequate about implementing “entire story playback” by successive playback of each line, one at a time?

  • Paul

On Jun 29, 2019, at 7:30 AM, David Beck notifications@github.com wrote:

Hi, Paul

Hmm. Maybe it is because I don’t understand Pandas, but when I ask IjalLine to print out the data frame created by buildTable(), I get

[4 rows x 13 columns] ANNOTATION_ID LINGUISTIC_TYPE_REF ... HAS_TABS HAS_SPACES 0 a36 default-lt ... False True 1 a51 phonemic ... True False 2 a56 translation ... False True 3 a194 translation ... True False

I don’t see the timecodes here, but I do see that there are 13 columns somehow even though print() only shows 6. If you can tell me how to extract the start and stop times, I can write a getStartStop() method for IjalLine and have Text call it as it builds the HTML.

BTW, AudioExtractor also tracks time codes, so there is repetition no matter which way we go with this.

David

On Jun 29, 2019, at 8:17 AM, Paul Shannon notifications@github.com wrote:

Hi David,

<<<<<HEAD and etc are inserted by git when it detects differences between your changes to a file and what’s in the repo. Conflicting parts are marked off by these inserted lines. The standard response is to study the two sections, choose the lines you want to keep, remove the markup.

I think you will find start and end times extracted in ijalLine.py starting at line 304 in the buildTable method. They are returned to the caller in the table created in that method, and stored in the ijalLine object.

The times are fundamental to the ijalLine class - and no other object in the code needs to keep track of them.

It’s a bad idea for any other object to track them separately!

webapp.py has a Text object. A Text object has ijalLines. Each ijalLine has a start and stop time. If any other bit of code needs to know those times, they should query each individual ijalLine object for that information.

The essential programming principle is DRY: do not repeat yourself. If you DO repeat yourself, maintenance and future development of the code - and debugging when problems appear, as they always ill - become increasingly nasty processes.

  • Paul

On Jun 29, 2019, at 6:38 AM, David Beck notifications@github.com wrote:

Hi, Paul

I couldn’t find anything in IJALline.py that tracks time codes or makes any use of them, so instead I'm working on getting the table I need in Text() by importing and calling the determineStartAndEndTimes() method from AudioExtractor. This seems to work and all the unit tests now pass with the smaller signature for the Text() constructor except for test_text_inferno.py. This is one of the files you marked up with

<<<<<HEAD

===========

master

notation. I don’t recognize that, but whatever it is it doesn’t play nice with my Python interpreter (or Xcode). I found some other places that had that and just commented it out, but in this particular file I can’t seem to do that in a way that lets the test pass.

Anyway, I’ve got all this on a new branch and I’ll put in a pull request so you can review it once I’ve adjusted webapp.py to make sure that the whole thing works properly.

Cheers,

David

On Jun 28, 2019, at 1:06 PM, David J Beck dbeck@ualberta.ca wrote:

Hm. I will look at IJALine and see how it is handling start/stop times. Text needs to know these because the automated playback needs a lookup table to track what line is being played at any given point in time, but it is probably possible to build that incrementally as each IJALline object is created (the values for a given line could be passed back to Text) and then have that put into the HTML output in some way once all the lines are constructed (though this might be a problem because it would have to be early in the file, before all the lines are built).

Still seems like more work than fixing webapp.py, though. I’ll try to find some time tomorrow or Sunday to evaluate this and maybe write a test based on some modified versions of Text and IJALline to see how it goes.

David

On Jun 28, 2019, at 12:50 PM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

  • every text consists of some number of IjalLines
  • every IjalLine parses its start and stop times from its xml element
  • Only Ijal lines should know about start and stop times - I’m pretty sure of this
  • If the Text object needs start and stop times, it can get them from the IjalLines it holds

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

  • Paul

On Jun 28, 2019, at 10:49 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBVUVW5FJKOJXNUVFNDP4ZMP3A5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY24OYY#issuecomment-506840931, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSWRPCDZF4XG6WOQHTP4ZMP3ANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBX3HWVMHWSIQ5FN4XLP45VGHA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3ZSEA#issuecomment-506960144, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBVXOBIEGMRRN5VHNV3P45VGHANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBXWKQR6RQTDZGS4QLLP45YJVA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY32A3Q#issuecomment-506962030, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBQH3RRCGKUKX63TWB3P45YJVANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

davidjamesbeck commented 5 years ago

Will do (thanks). Now, how do I get your amendments to ijalLine.py and test_IjalLine.py using git, keeping in mind that the file you amended may have been amended on my end since my last push, since I was studying the buildTable method—that is, how can I tell git to replace my amended version with yours?

David

On Jun 29, 2019, at 9:44 AM, Paul Shannon notifications@github.com wrote:

I just committed minor changes to ijalLine.py with these new methods:

getStartTime() getEndtimme()

tested (and therefore demonstrated) in test_IjalLine.py, “test_getStartStopTimes()” on lokono line 3

Let me know if this does the trick.

  • Paul

On Jun 29, 2019, at 8:19 AM, David Beck notifications@github.com wrote:

Hi, Paul

The start and end times are needed by the Javascript that controls animation during continuous playback. Webapp is fine without the start and stop times if Text can get them. On the Master branch currently, webapp.py asks AudioExtractor for a startStopTable in the extractAudio methods and passes it to Text in the createWebPage method, but I undid that in the startStopTable branch, which assumes that Text get the timecodes from its own methods rather than as an argument from createWebPage.

Playing the individual clips in sequence is inadequate (for a linguist) for several reasons. One is that it sounds jerky and unnatural, and (I suspect) things like loading time for clips will make it very uneven. It is also likely that many people will not parse the recording exhaustively in ELAN, so that the time codes that correspond to lines of text may leave gaps in the recording. Where there are pauses (or even asides!) in the story, the line-by-line playback will simply skip over those. Aside from erasing data (linguists are interested in pauses and asides), it will also sound unnatural to the trained ear. I also think community members will be sensitive to the program removing anything from the performance.

I’m not averse to using IjalLine to get times to Text, I just don’t know how to do it :-( . I know I’ll need to learn Pandas someday (it looks like I could use it for several other things I’m working on), but it has been lower priority than all the rest of the stuff I’m playing catch-up with, and opaque enough that I haven’t been able to fake it like I do with Javascript.

David

Could you remind me what part of webapp.py needs to know start and end times? Maybe this is asking (sorry if you already explained this): what is inadequate about implementing “entire story playback” by successive playback of each line, one at a time?

  • Paul

On Jun 29, 2019, at 7:30 AM, David Beck notifications@github.com wrote:

Hi, Paul

Hmm. Maybe it is because I don’t understand Pandas, but when I ask IjalLine to print out the data frame created by buildTable(), I get

[4 rows x 13 columns] ANNOTATION_ID LINGUISTIC_TYPE_REF ... HAS_TABS HAS_SPACES 0 a36 default-lt ... False True 1 a51 phonemic ... True False 2 a56 translation ... False True 3 a194 translation ... True False

I don’t see the timecodes here, but I do see that there are 13 columns somehow even though print() only shows 6. If you can tell me how to extract the start and stop times, I can write a getStartStop() method for IjalLine and have Text call it as it builds the HTML.

BTW, AudioExtractor also tracks time codes, so there is repetition no matter which way we go with this.

David

On Jun 29, 2019, at 8:17 AM, Paul Shannon notifications@github.com wrote:

Hi David,

<<<<<HEAD and etc are inserted by git when it detects differences between your changes to a file and what’s in the repo. Conflicting parts are marked off by these inserted lines. The standard response is to study the two sections, choose the lines you want to keep, remove the markup.

I think you will find start and end times extracted in ijalLine.py starting at line 304 in the buildTable method. They are returned to the caller in the table created in that method, and stored in the ijalLine object.

The times are fundamental to the ijalLine class - and no other object in the code needs to keep track of them.

It’s a bad idea for any other object to track them separately!

webapp.py has a Text object. A Text object has ijalLines. Each ijalLine has a start and stop time. If any other bit of code needs to know those times, they should query each individual ijalLine object for that information.

The essential programming principle is DRY: do not repeat yourself. If you DO repeat yourself, maintenance and future development of the code - and debugging when problems appear, as they always ill - become increasingly nasty processes.

  • Paul

On Jun 29, 2019, at 6:38 AM, David Beck notifications@github.com wrote:

Hi, Paul

I couldn’t find anything in IJALline.py that tracks time codes or makes any use of them, so instead I'm working on getting the table I need in Text() by importing and calling the determineStartAndEndTimes() method from AudioExtractor. This seems to work and all the unit tests now pass with the smaller signature for the Text() constructor except for test_text_inferno.py. This is one of the files you marked up with

<<<<<HEAD

===========

master

notation. I don’t recognize that, but whatever it is it doesn’t play nice with my Python interpreter (or Xcode). I found some other places that had that and just commented it out, but in this particular file I can’t seem to do that in a way that lets the test pass.

Anyway, I’ve got all this on a new branch and I’ll put in a pull request so you can review it once I’ve adjusted webapp.py to make sure that the whole thing works properly.

Cheers,

David

On Jun 28, 2019, at 1:06 PM, David J Beck dbeck@ualberta.ca wrote:

Hm. I will look at IJALine and see how it is handling start/stop times. Text needs to know these because the automated playback needs a lookup table to track what line is being played at any given point in time, but it is probably possible to build that incrementally as each IJALline object is created (the values for a given line could be passed back to Text) and then have that put into the HTML output in some way once all the lines are constructed (though this might be a problem because it would have to be early in the file, before all the lines are built).

Still seems like more work than fixing webapp.py, though. I’ll try to find some time tomorrow or Sunday to evaluate this and maybe write a test based on some modified versions of Text and IJALline to see how it goes.

David

On Jun 28, 2019, at 12:50 PM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

  • every text consists of some number of IjalLines
  • every IjalLine parses its start and stop times from its xml element
  • Only Ijal lines should know about start and stop times - I’m pretty sure of this
  • If the Text object needs start and stop times, it can get them from the IjalLines it holds

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

  • Paul

On Jun 28, 2019, at 10:49 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBVUVW5FJKOJXNUVFNDP4ZMP3A5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY24OYY#issuecomment-506840931, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSWRPCDZF4XG6WOQHTP4ZMP3ANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBX3HWVMHWSIQ5FN4XLP45VGHA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3ZSEA#issuecomment-506960144, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBVXOBIEGMRRN5VHNV3P45VGHANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBXWKQR6RQTDZGS4QLLP45YJVA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY32A3Q#issuecomment-506962030, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBQH3RRCGKUKX63TWB3P45YJVANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBUR7W64AN4ZWR4KJQ3P457NHA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY33DEQ#issuecomment-506966418, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBVEKXF32VTNZL3UHX3P457NHANCNFSM4H4G4BAA.

paul-shannon commented 5 years ago

My usual approach, so as not to completely lose local changes:

my ijalLine.py ijalLine.py-28jun-workInProgress git pull

I have gotten myself confused: when I run AudioExtractor, it numbers files from zero, not one. But I think we have always used 1-based counting before. I see that all of the test files (lovely to see so many of them!) end up with the 1-based wav files. Do you know what I am missing?

On Jun 29, 2019, at 8:49 AM, David Beck notifications@github.com wrote:

Will do (thanks). Now, how do I get your amendments to ijalLine.py and test_IjalLine.py using git, keeping in mind that the file you amended may have been amended on my end since my last push, since I was studying the buildTable method—that is, how can I tell git to replace my amended version with yours?

David

On Jun 29, 2019, at 9:44 AM, Paul Shannon notifications@github.com wrote:

I just committed minor changes to ijalLine.py with these new methods:

getStartTime() getEndtimme()

tested (and therefore demonstrated) in test_IjalLine.py, “test_getStartStopTimes()” on lokono line 3

Let me know if this does the trick.

  • Paul

On Jun 29, 2019, at 8:19 AM, David Beck notifications@github.com wrote:

Hi, Paul

The start and end times are needed by the Javascript that controls animation during continuous playback. Webapp is fine without the start and stop times if Text can get them. On the Master branch currently, webapp.py asks AudioExtractor for a startStopTable in the extractAudio methods and passes it to Text in the createWebPage method, but I undid that in the startStopTable branch, which assumes that Text get the timecodes from its own methods rather than as an argument from createWebPage.

Playing the individual clips in sequence is inadequate (for a linguist) for several reasons. One is that it sounds jerky and unnatural, and (I suspect) things like loading time for clips will make it very uneven. It is also likely that many people will not parse the recording exhaustively in ELAN, so that the time codes that correspond to lines of text may leave gaps in the recording. Where there are pauses (or even asides!) in the story, the line-by-line playback will simply skip over those. Aside from erasing data (linguists are interested in pauses and asides), it will also sound unnatural to the trained ear. I also think community members will be sensitive to the program removing anything from the performance.

I’m not averse to using IjalLine to get times to Text, I just don’t know how to do it :-( . I know I’ll need to learn Pandas someday (it looks like I could use it for several other things I’m working on), but it has been lower priority than all the rest of the stuff I’m playing catch-up with, and opaque enough that I haven’t been able to fake it like I do with Javascript.

David

Could you remind me what part of webapp.py needs to know start and end times? Maybe this is asking (sorry if you already explained this): what is inadequate about implementing “entire story playback” by successive playback of each line, one at a time?

  • Paul

On Jun 29, 2019, at 7:30 AM, David Beck notifications@github.com wrote:

Hi, Paul

Hmm. Maybe it is because I don’t understand Pandas, but when I ask IjalLine to print out the data frame created by buildTable(), I get

[4 rows x 13 columns] ANNOTATION_ID LINGUISTIC_TYPE_REF ... HAS_TABS HAS_SPACES 0 a36 default-lt ... False True 1 a51 phonemic ... True False 2 a56 translation ... False True 3 a194 translation ... True False

I don’t see the timecodes here, but I do see that there are 13 columns somehow even though print() only shows 6. If you can tell me how to extract the start and stop times, I can write a getStartStop() method for IjalLine and have Text call it as it builds the HTML.

BTW, AudioExtractor also tracks time codes, so there is repetition no matter which way we go with this.

David

On Jun 29, 2019, at 8:17 AM, Paul Shannon notifications@github.com wrote:

Hi David,

<<<<<HEAD and etc are inserted by git when it detects differences between your changes to a file and what’s in the repo. Conflicting parts are marked off by these inserted lines. The standard response is to study the two sections, choose the lines you want to keep, remove the markup.

I think you will find start and end times extracted in ijalLine.py starting at line 304 in the buildTable method. They are returned to the caller in the table created in that method, and stored in the ijalLine object.

The times are fundamental to the ijalLine class - and no other object in the code needs to keep track of them.

It’s a bad idea for any other object to track them separately!

webapp.py has a Text object. A Text object has ijalLines. Each ijalLine has a start and stop time. If any other bit of code needs to know those times, they should query each individual ijalLine object for that information.

The essential programming principle is DRY: do not repeat yourself. If you DO repeat yourself, maintenance and future development of the code - and debugging when problems appear, as they always ill - become increasingly nasty processes.

  • Paul

On Jun 29, 2019, at 6:38 AM, David Beck notifications@github.com wrote:

Hi, Paul

I couldn’t find anything in IJALline.py that tracks time codes or makes any use of them, so instead I'm working on getting the table I need in Text() by importing and calling the determineStartAndEndTimes() method from AudioExtractor. This seems to work and all the unit tests now pass with the smaller signature for the Text() constructor except for test_text_inferno.py. This is one of the files you marked up with

<<<<<HEAD

===========

master

notation. I don’t recognize that, but whatever it is it doesn’t play nice with my Python interpreter (or Xcode). I found some other places that had that and just commented it out, but in this particular file I can’t seem to do that in a way that lets the test pass.

Anyway, I’ve got all this on a new branch and I’ll put in a pull request so you can review it once I’ve adjusted webapp.py to make sure that the whole thing works properly.

Cheers,

David

On Jun 28, 2019, at 1:06 PM, David J Beck dbeck@ualberta.ca wrote:

Hm. I will look at IJALine and see how it is handling start/stop times. Text needs to know these because the automated playback needs a lookup table to track what line is being played at any given point in time, but it is probably possible to build that incrementally as each IJALline object is created (the values for a given line could be passed back to Text) and then have that put into the HTML output in some way once all the lines are constructed (though this might be a problem because it would have to be early in the file, before all the lines are built).

Still seems like more work than fixing webapp.py, though. I’ll try to find some time tomorrow or Sunday to evaluate this and maybe write a test based on some modified versions of Text and IJALline to see how it goes.

David

On Jun 28, 2019, at 12:50 PM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

I think that programmer time, and programmer cognitive burden and by far the greater expense! Redundant code can be avoided by refactoring repeated code into a single utility function, used by everyone who needs it.

Also relevant here is that the IjalLine class extracts start and stop times from the xml. That’s code I wrote long ago.

I think that the IjalLine class remains, as before, the proper place for per-line time information. And that nobody else should care how they are obtained, or feel that they need to extract them - redundantly.

In detail:

  • every text consists of some number of IjalLines
  • every IjalLine parses its start and stop times from its xml element
  • Only Ijal lines should know about start and stop times - I’m pretty sure of this
  • If the Text object needs start and stop times, it can get them from the IjalLines it holds

In programming we call this “getting the abstractions right”. And “maximum ignorance”. Does the Text class really need to know about IjalLine times, in any way that it cannot get them from the existing IjalLine logic?

  • Paul

On Jun 28, 2019, at 10:49 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Yes, it could, but since we already do it in AudioExtractor, it seems both redundant and costly, no?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBVUVW5FJKOJXNUVFNDP4ZMP3A5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY24OYY#issuecomment-506840931, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSWRPCDZF4XG6WOQHTP4ZMP3ANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBX3HWVMHWSIQ5FN4XLP45VGHA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3ZSEA#issuecomment-506960144, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBVXOBIEGMRRN5VHNV3P45VGHANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBXWKQR6RQTDZGS4QLLP45YJVA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY32A3Q#issuecomment-506962030, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBQH3RRCGKUKX63TWB3P45YJVANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBUR7W64AN4ZWR4KJQ3P457NHA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY33DEQ#issuecomment-506966418, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBVEKXF32VTNZL3UHX3P457NHANCNFSM4H4G4BAA.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

davidjamesbeck commented 5 years ago

I have gotten myself confused: when I run AudioExtractor, it numbers files from zero, not one. But I think we have always used 1-based counting before. I see that all of the test files (lovely to see so many of them!) end up with the 1-based wav files. Do you know what I am missing?

I may have changed the numbering to zero-numbering accidentally, thinking that I had made it 1-based to accommodate the startStopTable I was building. Since I ended up generating those numbers in Text, I might have assumed I changed it in AudioExtractor and “fixed” it. Anyway, if we’re not using AudioExtractor for timecodes, we can revert to its original state or whatever you find most useful.

David

On Jun 29, 2019, at 9:53 AM, Paul Shannon notifications@github.com wrote:

my ijalLine.py ijalLine.py-28jun-workInProgress

paul-shannon commented 5 years ago

It’s my bug - in my new toEAF.py.

davidjamesbeck commented 5 years ago

Okay. I think I’ll add a test the make sure that the filenames for the clips match up to the IDs of the line elements anyway, just to be sure we’re all lined up,

David

On Jun 29, 2019, at 11:06 AM, Paul Shannon notifications@github.com wrote:

It’s my bug - in my new toEAF.py.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBTY7FYA3MI75FCCQHLP46JBRA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY34SGY#issuecomment-506972443, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBS7PRFCCRDZISDG37DP46JBRANCNFSM4H4G4BAA.

davidjamesbeck commented 5 years ago

Looks like everything is good to go. I added a couple of texts to make sure all the different numerations we need line up—it uses BeautifulSoup4 to parse the html. All the unit tests pass and the html output looks and works okay.

David

On Jun 29, 2019, at 12:49 PM, David Beck dbeck@ualberta.ca wrote:

Okay. I think I’ll add a test the make sure that the filenames for the clips match up to the IDs of the line elements anyway, just to be sure we’re all lined up,

David

On Jun 29, 2019, at 11:06 AM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

It’s my bug - in my new toEAF.py.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/38?email_source=notifications&email_token=AKN4HBTY7FYA3MI75FCCQHLP46JBRA5CNFSM4H4G4BAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY34SGY#issuecomment-506972443, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBS7PRFCCRDZISDG37DP46JBRANCNFSM4H4G4BAA.