Closed justinsalamon closed 8 years ago
Looks like this is ready for a CR. @bmcfee (or anyone else)?
@bmcfee are you back from crunchland yet?
@bmcfee are you back from crunchland yet?
nope; next week okay?
nope; next week okay?
yeah sure, thanks!
@bmcfee I've updated the API again based on our latest discussion, refactoring trim()
into trim()
and slice()
. I've documented the new expected behavior of each function in detail here, which is probably worth reading before giving this PR a CR, which I think it's ready for again.
Review status: 0 of 2 files reviewed at latest revision, 16 unresolved discussions.
jams/core.py, line 47 at r2 (raw file):
import gzip import datetime
why datetime?
jams/core.py, line 897 at r2 (raw file):
of tuples to the annotation's sandbox keyed by `sandbox.trim` which documents each trim operation with a tuple `(start_time, end_time, trim_start, trim_end)`.
Please break this up into a short (one-line) description and then a more structured, detailed description.
Other notes:
self
is not an externally meaningful variable name for object methods.jams/core.py, line 939 at r2 (raw file):
# The annotation must have it's duration value set for trimming if self.duration is None:
Why is this an error? The semantics of duration=None
are that the annotation implicitly spans the full track duration as specified in file_metadata
.
jams/core.py, line 990 at r2 (raw file):
confidence=obs['confidence']) if 'trim' not in ann_trimmed.sandbox.keys():
you don't need to call .keys()
here
jams/core.py, line 995 at r2 (raw file):
else: ann_trimmed.sandbox.trim.append( (start_time, end_time, trim_start, trim_end))
I think trim history would make more sense as a list of dictionaries, rather than a list of tuples.
Also, your annotation construction seems to be a copy-by-reference of sandbox
, so I think this will also populate the trim
field in the input annotation's sandbox. This is probably not what you intended.
jams/core.py, line 1015 at r2 (raw file):
trimming operation will also be documented in `Annotation.sandbox.trim` as described in `Annotation.trim`.
Again, provide a short (1-sentence) docstring and then a detailed description.
jams/core.py, line 1023 at r2 (raw file):
---------- start_time : float The desired start time for slicing.
+"in seconds"
jams/core.py, line 1039 at r2 (raw file):
sliced_ann : Annotation The sliced annotation.
Add a "see also" to trim
Add an example
jams/core.py, line 1050 at r2 (raw file):
adjustment = ref_time - sliced_ann.time if sliced_ann.data is not None:
why would this ever be None
?
jams/core.py, line 1051 at r2 (raw file):
if sliced_ann.data is not None: for idx in range(len(sliced_ann.data.index)):
This isn't the correct way to iterate over dataframe indices.
Instead, do
for idx in sliced_ann.data.index:
or avoid iteration altogether:
sliced_ann.data['time'] = sliced_ann.data['time'].apply(lambda x: x - pd.to_timedelta(adjustment, unit='s')
jams/core.py, line 1058 at r2 (raw file):
slice_end = ref_time + sliced_ann.duration if 'slice' not in sliced_ann.sandbox.keys():
no need for .keys(), and same comments as in trim
above.
jams/core.py, line 1561 at r2 (raw file):
raise JamsError( 'Duration must be set (jam.file_metadata.duration) before ' 'trimming can be performed.')
I don't see why this should be required here.
jams/core.py, line 1567 at r2 (raw file):
start_time > float(self.file_metadata.duration) or end_time < start_time or end_time > float(self.file_metadata.duration)):
fun fact: python supports triple comparison, so you can do not (0 <= start_time <= self.file_metadata.duration)
jams/core.py, line 1576 at r2 (raw file):
jam_trimmed = JAMS(annotations=None, file_metadata=self.file_metadata, sandbox=self.sandbox)
I think you want a deep copy here, not copy-by-reference.
jams/core.py, line 1583 at r2 (raw file):
# Document jam-level trim in top level sandbox if 'trim' not in jam_trimmed.sandbox.keys():
-.keys()
_tests/jams_test.py, line 732 at r2 (raw file):_
def test_annotation_trim():
Please break this test into smaller test functions for each test case.
Comments from Reviewable
Review status: 0 of 2 files reviewed at latest revision, 16 unresolved discussions.
jams/core.py, line 47 at r2 (raw file):
Thought I needed it for some conversions but ended up using pandas instead. Removed.
_jams/core.py, line 897 at r2 (raw file):_
Removed references to self, broken up docstring into more manageable paragraphs, and added examples.
jams/core.py, line 939 at r2 (raw file):
I need to know the time range spanned by the annotation to compute the intersection with the trim range specified by the user, for which I need
Annotation.duration
.
Since an Annotation object can exist independently of a JAMS object, while duration=None
means "use the jams duration", there's no way for me to access this value from inside this function (or the value may not exist).
_jams/core.py, line 990 at r2 (raw file):_
Tried without but it breaks (problem in sandbox implementation?), leaving in for now.
jams/core.py, line 995 at r2 (raw file): OK, I've switched the history to be a list of dicts.
Regarding the sandbox, I don't think it's copied by reference:
# Create new annotation with same namespace/metadata
ann_trimmed = Annotation(
self.namespace,
data=None,
annotation_metadata=self.annotation_metadata,
sandbox=self.sandbox,
time=trim_start,
duration=trim_end - trim_start)
If you try it out the sandbox of the original annotation is unchanged:
>>> ann = jams.Annotation(namespace='tag_open', time=2, duration=8)
>>> ann_trim = ann.trim(5, 8)
>>> print(ann_trim.sandbox)
{
"trim": [
{
"start_time": 5,
"trim_start": 5,
"end_time": 8,
"trim_end": 8
}
]
}
>>> print(ann.sandbox)
{}
jams/core.py, line 1015 at r2 (raw file):
Fixed
jams/core.py, line 1023 at r2 (raw file):
Added
jams/core.py, line 1039 at r2 (raw file):
Added see also and example.
_jams/core.py, line 1050 at r2 (raw file):_
When I was playing around with this during initial development I seem to have reached this case, but I can't reproduce it, so removing the check for
None
.
[jams/core.py, line 1051 at r2](https://reviewable.io:443/reviews/marl/jams/136#-KVpHDoOOuzgPgC4xhjY:-KW4gpwTC7M22YwPsd8:b-z1ojeq) (raw file):_
Switched to lambda function implementation
_jams/core.py, line 1058 at r2 (raw file):_
As noted above with .keys() the call breaks, leaving as is for now.
jams/core.py, line 1561 at r2 (raw file):
You need it to perform the next check:
if not (0 <= start_time <= end_time <= float(self.file_metadata.duration)):
We could do away with this check, but then it would be possible to call JAMS.trim
with a time range that doesn't (or only partially) intersects with the jam. Would that make sense?
_jams/core.py, line 1567 at r2 (raw file):_
Nice, so I can actually do:
if not (0 <= start_time <= end_time <= float(self.file_metadata.duration)):
_jams/core.py, line 1576 at r2 (raw file):_ I think this does result in a deep copy, no?
>>> jam = jams.JAMS()
>>> jam.file_metadata.duration = 10
>>> jam_trim = jam.trim(5, 8)
>>> print(jam.sandbox)
{}
>>> print(jam_trim.sandbox)
{
"trim": [
[
5,
8
]
]
}
jams/core.py, line 1583 at r2 (raw file):
See previous comments, leaving as is for now.
_tests/jams_test.py, line 732 at r2 (raw file):_
Broke it.
Comments from Reviewable
Reviewed 1 of 3 files at r3. Review status: 1 of 3 files reviewed at latest revision, 1 unresolved discussion.
jams/core.py, line 939 at r2 (raw file):
Got it.
I think it's also reasonable, in this case, to just override the time/duration fields with the trim interval. Before trimming, the annotation spans whatever the jams file spans; afterwards, it explicitly spans the specified interval.
The only ambiguous case is when the trim interval overruns file_metadata.duration. I think this is detectable at the level of JAMS.trim
, but it's fine to let it slide at the annotation level.
The reason I'm picking on this point is that a lot of annotations do have implicit span, in which case we'd be unable to use trim/slice.
Comments from Reviewable
Review status: 1 of 3 files reviewed at latest revision, 1 unresolved discussion.
jams/core.py, line 939 at r2 (raw file):
Yeah I see your point. Like you say it can lead to some weird behavior (worse than the trim interval overrunning
file_metadata.duration
, it might not even intersect with it at all...). But I guess overall it's better to allow this to happen and trust the user will avoid it, at the benefit of being able to trim annotations that don't have an explicit duration set. Implemented.
Comments from Reviewable
Reviewed 1 of 2 files at r4. Review status: 2 of 3 files reviewed at latest revision, 2 unresolved discussions.
_tests/jams_test.py, line 983 at r4 (raw file):_
def test_jams_trim(): @raises(jams.ParameterError, jams.JamsError)
do you really need both errors here? it would be better to have separate test cases for separate error triggers. A catch-all for our package's base exception class is dangerous and could mask other errors.
_tests/jams_test.py, line 997 at r4 (raw file):_
trim_times = [(-5, -1), (-5, 10), (-5, 20), (5, 20), (18, 20), (10, 8)] for tt in trim_times: __test_error(jam, *tt)
if you yield __test_error, jam, tt
, each of these cases will show up separately. this makes debugging easier.
Comments from Reviewable
Review status: 2 of 3 files reviewed at latest revision, 2 unresolved discussions.
_tests/jams_test.py, line 983 at r4 (raw file):_
OK, I've broken these out into separate tests.
_tests/jams_test.py, line 997 at r4 (raw file):_
Sure, changed.
Comments from Reviewable
Reviewed 1 of 1 files at r5. Review status: all files reviewed at latest revision, all discussions resolved.
Comments from Reviewable
This PR will implement a trim function for jams objects as discussed in #135
This change is