inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
76 stars 21 forks source link

'double counting' in line transect integration weights #43

Open ASeatonSpatial opened 6 years ago

ASeatonSpatial commented 6 years ago

When we use a SpatialLinesDataFrame as a samplers argument, the integration scheme for the 'coordinates' domain integrates over the whole strip. If a line segment has weight 2W (the width) and length L then the integration weight at the midpoint is 2WL.

When we combine this with the 'distance' domain trick we get an extra W term we do not want in the integration weights. To compensate we add log1/W to the linear predictor.

Intuitively if my data is along a line and in a distance domain, I would want my integration to go along a line and through the distance domain, instead of integrating over an area and undoing this by hand in the linear predictor.

However, if not using the 'distance' trick, the full integration over space is definitely preferable.

I don't have a clear suggestion for how this can be improved. Integrating along the line would require log2g(r) in the linear predictor instead of logg(r) which I am not sure is any more intuitive than log1/W

ASeatonSpatial commented 6 years ago

I got my maths wrong - I think the workaround to avoid log1/W should be either line samplers with weight = 2

or

weight = 1 on lines but weight = 2 on integration in distance domain

There is no need for log2g(r)

finnlindgren commented 6 years ago

Or option 3, with weight=1, and +log(2) in the formula, which becomes +log(2 * pi * distance) for plot sampling.

For plot sampling, options 1 and 2 could be used for the 2*pi part; “we don’t know left or right” and “we don’t know the angle”.

The distance scaling in plot sampling must be in the formula, as there is no method for specifying it in the “integration” information without first constructing the integration points. [Edit: it's not just the integration that's the issue, but also what the intensity for the observed points should be.]

finnlindgren commented 3 years ago

Options when the predictor is in "counts per area" units, apart from an offset: For line transects: 1) domain = space, line-weight = line-width 2) domain = (space,distance), line-weight = 1, offset = +log(2)

For point transects: 1) domain = space, point-weight = site-area 2) domain = (space, distance), point-weight=1, offset = +log(2*pi*distance)

Probably the safest option is to not store the transect size in a weight column, and instead add it when needed; sampler = cbind(sampler, data.frame(weight = transectwidth))

finnlindgren commented 3 years ago

To check whether the offset can be placed in E instead. Depends on if E is used in all places that would be required for this to work, and if that is well defined. [This might still be tricky to get to work.]