isi-vista / adam

Abduction to Demonstrate an Articulate Machine
MIT License
11 stars 3 forks source link

Add spatial/size features to preprocessing #1183

Closed boyleconnor closed 1 year ago

boyleconnor commented 1 year ago

I'm not sure if I've defined the touching and relative_distance features in the way that you intended, so please give feedback if you'd like to see them changed. For reference, here are the contents of a file after having had the new preprocessing script applied:

Feature file (incl. new features) ```yaml objects: - color: - 102.34271343121262 - 21.904295812941815 - 14.09584013050571 distance: object1: 68.79644935084134 object_name: object0 relative_distance: object1: relative_magnitude: 0.035752413577111536 touching: true x_offset: 2.5398225845627196 y_offset: -0.47677586589576915 stroke_graph: adjacency_matrix: - - 0.0 - 1.0 - - 1.0 - 0.0 concept_name: banana confidence_score: 1.0 size: area: 1924.2462946581147 x: 17.40109507685969 y: 110.58190798675736 stroke_mean_x: 170.3899596004725 stroke_mean_y: 172.161663308507 stroke_std: 26.09357431141853 strokes_normalized_coordinates: - - - -0.19679370424189446 - -0.13314536137315894 - -0.0705682371606178 - -0.006365346046560497 - 0.05857845372770687 - 0.1200947143454908 - 0.18431169092653332 - 0.1818940098901814 - 0.19095213865507057 - 0.15066153751774813 - - -1.4000328950011243 - -1.3725433689151187 - -1.3432738693943873 - -1.2798830230784735 - -1.2169513382458692 - -1.152088418269606 - -1.089176265104926 - -1.023875357875486 - -0.9618939894331183 - -0.8979203682783222 - - - 0.12643004814998426 - 0.22176167396167565 - 0.2181081760371411 - 0.14001238635528657 - 0.02398367562045615 - -0.07778257832283154 - -0.15492393608702595 - -0.22121766971025056 - -0.3104959113929362 - -0.445495760852001 - - -0.4927816687663369 - -0.13512359999676343 - 0.23328341777747616 - 0.6133651295436751 - 0.9848453110383716 - 1.3626824676809097 - 1.7341626491756061 - 2.1142443609418105 - 2.4826513787160414 - 2.8403094474856183 sub_part: null subobject_id: '0' texture: null viewpoint_id: '-1' - color: - 102.34271343121262 - 21.904295812941815 - 14.09584013050571 distance: object0: 68.79644935084134 object_name: object1 relative_distance: object0: relative_magnitude: 0.015554564844782166 touching: true x_offset: -1.0971088356727254 y_offset: 0.480198093880458 stroke_graph: adjacency_matrix: - - 0.0 - 0.0 - 0.0 - - 0.0 - 0.0 - 0.0 - - 0.0 - 0.0 - 0.0 concept_name: banana confidence_score: 1.0 size: area: 4422.910575599893 x: 40.28378300793783 y: 109.79382385036598 stroke_mean_x: 223.11274453326502 stroke_mean_y: 127.96596903617561 stroke_std: 53.98936705531424 strokes_normalized_coordinates: - - - 0.17912628802197353 - -0.3404940210984439 - -0.7136629071740983 - -0.8427728768657036 - -0.9045767779088282 - -0.8960156113482385 - -0.9033295121652334 - -0.8787865196947505 - -0.7625618617507843 - -0.6121304012239834 - - -3.286981397730838 - -3.244484497566174 - -2.755934954123201 - -2.2404572957746387 - -1.7136990096078648 - -1.1925306927102 - -0.672119319539048 - -0.14435992688830196 - 0.37012736674281765 - 0.8660241437120265 - - - 0.6665068826956971 - 0.6691902452788666 - 0.6636528012144712 - 0.6735123744435048 - 0.6308949326253741 - 0.6303689637740658 - 0.6227809308712156 - 0.6505225578992739 - 0.6708831944090617 - 0.6660370135591713 - - 0.34825419130134094 - 0.39695654376322775 - 0.44321350049475916 - 0.492233504383002 - 0.5393691976668081 - 0.5878522343274254 - 0.6349879276112305 - 0.6840079314994734 - 0.730264888231008 - 0.7789672406928927 - - - 0.0013331393142680039 - 0.0013331393142674472 - 0.0013331393142691174 - 0.0013331393142702308 - 0.0013331393142691174 - 0.0013331393142691174 - 0.0013331393142691174 - 0.04050744307917171 - 0.0405074430791706 - 0.0405074430791706 - - 0.6615464754093683 - 0.7007207791742676 - 0.7398950829391723 - 0.7790693867040749 - 0.8182436904689753 - 0.8574179942338768 - 0.8965922979987794 - 0.9357666017636809 - 0.9749409055285811 - 1.0141152092934838 sub_part: null subobject_id: '0' texture: null viewpoint_id: '-1' ```

I tried to implement the feature extraction according to how they were described in the proposal document. A couple of the features work somewhat counterintuitively. Relative distance is normalized according to the size of this object, so the relative distance from x -> y is generally different from the relative distance from y -> x. touching is also determined based on this value, so x can be "touching" y at the same time y is not "touching" x.

I also set GAMMA as a fairly randomly-chosen global constant. I can work on searching for a "good" value of Gamma, and/or I can make it a script parameter (i.e. let user set --gamma)

I might also take some time to organize/name some of the features in a more consistent way in the output feature files. I might also do some light refactoring (some of the conversion/casting is really inconsistent).

boyleconnor commented 1 year ago

Thanks for the review, @spigo900. To be clear, when you said:

That would be another dict like relative_distance...

I think you meant to write relative_size. Correct?

spigo900 commented 1 year ago

Yes, it (ETA: the new dict) would be called relative_size, but I meant to write relative_distance -- I was comparing the dict to the one for relative_distance.

boyleconnor commented 1 year ago

@spigo900 I just realized that the way you specified relative distances in the document is different from what I did. The document says to normalize by the product of both objects, not just the object whose feature-set we're in. Do you want me to change my implementation to match the document's specification?

spigo900 commented 1 year ago

@boyleconnor Ah, I forgot about that. Yes, please update that.

boyleconnor commented 1 year ago

I still think it's a bit strange to base the touching relationship on the center-to-center distance of two objects. It helps that it is based on relative distance (so it incorporates the objects' own sizes), but it's still not the way I'd do it.

spigo900 commented 1 year ago

@boyleconnor What would you do instead?

boyleconnor commented 1 year ago

@spigo900 I didn't have any one concrete method in mind when I wrote that, but I suppose you could look at the individual control points of the strokes (since they're outlines, they should actually represent the visual edges of the objects). Since the strokes are reduced to 10 control points each, even a brute force comparison would be at most a couple thousand comparisons per object pair.

spigo900 commented 1 year ago

@boyleconnor Want to implement that method and compare running times on the full curriculum? I'm not opposed if it's fast enough and it's a fair point the number of comparisons is low.

boyleconnor commented 1 year ago

@spigo900 I've been running stroke extraction on my laptop, which is not very scientific, but it should at least get us a rough estimate of the difference in run time. This is the time for running stroke extraction on m5_objects_v0_with_mugs with the pointwise touching calculations:

7640.89s user 1260.65s system 234% cpu 1:03:14.52 total

I am currently running it without the pointwise touching calculations to see the difference in time. I'd guess it will take around 45 minutes.

boyleconnor commented 1 year ago

Here is the time for the run with no pairwise calculations (same dataset, also on my laptop):

7673.44s user 1276.39s system 301% cpu 49:30.83 total

so in all, the pairwise calculations are taking about 13-14 minutes. If we later run this code on a curriculum that is full of multi-object situations, it could be significantly longer due to quadratic scaling, but I think it would still remain well within tractability.

spigo900 commented 1 year ago

@boyleconnor Good enough for me. If we find it's too slow we can easily fall back on the dumber version, or else something slightly smarter based on the bounding boxes. I'll re-review this later today.

boyleconnor commented 1 year ago

At commit b08024e, I ran the stroke extraction script on the m5 train curriculum (again on my ISI laptop) and it finished with the following time:

 7842.58s user 1292.84s system 301% cpu 50:30.49 total

It looks like the NumPy calculations were a big boost to speed.

EDIT: I don't think the numpy ops version of this (the one I just committed) can do lazy evaluations, so in theory we are losing a bit of an advantage in the cases in which there are actually touching objects. On the other hand, I think the speed up for the cases we do have to process is big enough to justify it. So let's keep it using the NumPy ops.