lmurmann / multi_illumination

MIT License
52 stars 8 forks source link

When are you going to public the trained model? #1

Open Haoyanlong opened 4 years ago

Haoyanlong commented 4 years ago

@lmurmann Hello, I want to do the illumination research topic.I want to know when you are going to public the model.Thank you very much!

lmurmann commented 4 years ago

Thanks for reaching out, I pushed a commit 141754249ed2bfbcb3cea90405babeebc60490f6 with the model and evaluation script for illumination estimation. I am also working on releasing the relighting model, but still need a bit more time to clean up and package the code.

Haoyanlong commented 4 years ago

@lmurmann ,I have downloaded the dataset in your project site!I want to train the model of illumination estimation,the probes of input image is masked!But I don't know how to mask the probe in input image.Whether it is masked according to materias? Could you give me some advice?Thank you very much! image

lmurmann commented 4 years ago

There is some extra meta data in json files stored with each scene (~/.multilum/<scene>/meta.json)

If the file is not there, you can download it from http://data.csail.mit.edu/multilum/<scene>/meta.json.

This file should contain entries like this { ...

  "bounding_box": {
   "x": 938.034985474631,
   "y": 2817.43052160101,
   "w": 975.3908405594467,
   "h": 948.096621403985
  },
  "boundary_points": [
   {
    "x": 1463.6340801792003,
    "y": 2820.1783656529014
   },
   {
    "x": 1692.8230026207625,
    "y": 2879.8437901088173
   },

....

for "gray" ball and "chrome" ball. The "boundary points" are hand-annotated points on the silhouette (around 10 points per ball). The "bounding_box" is a tight fitting axis-aligned bounding box. Coordinates are in pixels in range [0, 6000)x[0, 4000).

Haoyanlong commented 4 years ago

@lmurmann ,I have trained the single illumination estimation model.The images are as follow(input,prediction, groundtruth).I don't konw how to render a virtual object compositing in the scene.Could you teach me? Thank you very much! image image image

lmurmann commented 4 years ago

Rendering objects into the images of our dataset is a bit difficult since you don't know the scene's geometry or camera pose. We only have a single viewpoint per scene, and so it is generally not possible to infer these values.

I would suggest you start using an existing AR application. Searching for something like "open source AR toolkit" brings plenty of hits that look good. Or you can build your own by following a tutorial. Searching for "opencv AR tutorial" should give some good hits. Building it yourself might take a while, but it is a great learning experience.

Once you have a basic AR system up and running, you can plug in the illumination estimation network and use the illumination prediction to improve the shading of virtual objects.

I hope these pointers are helpful!

Haoyanlong commented 4 years ago

@lmurmann Hello, I have trained the model of single illumination estimation.The L2Loss curve(finetune) is as follows, image It is about 0.01459 in training dataset and 0.01897 in test dataset. The test images are as follows: 1.Input image 2.Groundtruth image 3.Pred_result image Could you tell me your test results? And I have another question.Whether is the trained model used to render object in video? Whether it is stable or not will not sway?Thank you very much!

lmurmann commented 4 years ago

Thanks for you questions.

Regarding stability, I have used the model on video input before and found it was quite stable. For a real application, you might want to add a simple filter that smoothes out potential variations in the prediction. If you find that your predictions jump around dramatically, maybe that is a sign of overfitting. Also, you should try to make the input video looks as much like the training data as possible to shrink the domain gap. The model will probably perform better on indoor videos than for outdoor data.

Regarding comparison to our model, you can run the probe_predict/eval.py script on and compare to your predictions. The MSE numbers you report sound pretty good! We found it useful to, in addition to MSE, compare other metrics, such as the direction of the center of the light source, since these are independent of any normalization or gamma choices that often have a large impact on MSE.

When comparing the center of the light source to the ground truth center, our predictions achieved 26.6◦ mean angular error.

Haoyanlong commented 4 years ago

@lmurmann .Could you tell me how to calculate angular error between of the center of light source and the ground truth center? On the other hand, I preprocess the input image by adding black mask and resizing in training, but I find you preprocess the input by crop(512, 512) in eval.py. Cound you tell me how to preprocess the input image in training! Thank you!

lmurmann commented 4 years ago

For calculating the center of the light source: In most cases, you can get an initialization by looking at the maximum image value. From that initialization, I fitted a gaussian to refine the fit. This works pretty well but I had to manually verify to make sure the optimization converged to the correct light source shape in all cases.

lmurmann commented 4 years ago

For training of the published model, I took (1500, 1000) pixels image and take random 512x512px crops (receptive field is half the image height).

I also tested with taking 256px crops (quarter of image height), but found the performance to be a bit worse, probably due to lack of context.

Haoyanlong commented 4 years ago

@lmurmann ,hello, I have tested the results of your model and the trained model from scratch as follows, I am confused with the difference of the sphere color.Could you give me some advice?Thank you very much!

1.input image 2.groundtruth image 3.the pred of your trained model image 4.the pred of my trained model image

And I don't konw how to do normalize the white balance and explosure of the input image with gray sphere.Thank you very much!

lmurmann commented 4 years ago

@Haoyanlong Both results look pretty reasonable. Are you sure that 3. was predicted from the published model? Usually the our predictions get the slightly yellow color cast of the prediction shown in 4.

Below some more background information on our data processing and advice for auto-exposure and custom WB

lmurmann commented 4 years ago

Auto Exposure

The published data already normalized exposure with respect to the gray ball so there is not much extra work to do. In order to normalize exposure, we rescale the image intensity so that mean intensity of the gray ball falls to a constant value (I believe around 0.3 in the .exr)

Normalizing on the gray ball can be not ideal when there are large brightness variations across the image. In such cases, you could re-normalize after extracting training/eval patches. In the probe_predict/eval.py script we include the following auto-expose helper function


def autoexpose(I): 
  """Simple Auto-Expose helper for arbitrary image patches: 
Clip brightest 10% of pixels, map lower 90% of pixels to [0, 1] range. 
You might have to change the 90% threshold depending on the application.
  """
  n = np.percentile(I[:,:,1], 90)
  if n > 0:
    I = I / n
  return I```
lmurmann commented 4 years ago

White Balance

Regarding white balance for the illumination estimation application, we rely on the camera-provided white balance and let the raw converter (dcraw) handle white balance for us. Relying on camera white balance generally works since the flash has a known color and so the raw converter simply matches the temperature of the flash.

In case you want to do white balance yourself using the gray ball, a simple approach is to extract the mean values of the red, green, and blue channels for the chrome ball, and then simply re-scale the red channel and the blue channel so that they match the mean intensity of the green channel.

lmurmann commented 4 years ago

Just pushed the code and trained model for the relighting task 11674f6f86.

artyomnaz commented 4 years ago

Hello, @lmurmann! Thanks for your article and the dataset. I would like ask you about your experiments with the model for a single image. Have you tried to do upsampling instead of a fully connected layer at the output of your model?

lmurmann commented 4 years ago

Hi Artyom,

Yes, we have tried and found that it works as well. With the encoder that we are using, we fund that, the outputs of the fully connected decoder were a bit sharper so we went with that instead.

On Mon, May 25, 2020 at 6:44 AM Artyom Nazarenko notifications@github.com wrote:

Hello, @lmurmann https://github.com/lmurmann! Thanks for your article and the dataset. I would like ask you about your experiments with the model for a single image. Have you tried to do upsampling instead of a fully connected layer at the output of your model?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lmurmann/multi_illumination/issues/1#issuecomment-633509333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALG4H2PMWWAFVQIIKZMUF3RTJDX7ANCNFSM4JLUWZXA .

artyomnaz commented 4 years ago

Ok, thanks for your answer :)

Hi Artyom, Yes, we have tried and found that it works as well. With the encoder that we are using, we fund that, the outputs of the fully connected decoder were a bit sharper so we went with that instead. On Mon, May 25, 2020 at 6:44 AM Artyom Nazarenko @.***> wrote: Hello, @lmurmann https://github.com/lmurmann! Thanks for your article and the dataset. I would like ask you about your experiments with the model for a single image. Have you tried to do upsampling instead of a fully connected layer at the output of your model? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALG4H2PMWWAFVQIIKZMUF3RTJDX7ANCNFSM4JLUWZXA .

naoto0804 commented 4 years ago

I have a question about reproducing the light probe estimation experiment. Could you elaborate a bit more on experimental details?

I just guessed some parameters below and trained the model, but it seems to be not working (almost giving an average image). input/target: LDR RGB image (the range of values is between -0.5~0.5), lr=1.0 10^{-4}, optimizer=Adam, iteration=2.0 10^{5}, random crop (512x512), loss: MSE(L2)

In Sec 4.2.1, you mentioned Like in Sec 4.1, we work in the log-domain to limit the dynamic range of the network’s internal activations, but there seems to be no description in Sec4.1

From left to right: input, output (no auto exposure), target input_output_target

lmurmann commented 4 years ago

Hi Naoto, Your configuration generally looks good. You should be able to train a working version of the illumination prediction without converting to log domain.

In your setup, if the parameters that you mentioned don't work, I would try SGD with 1e-3 step size and L1 loss. Generally we were able to train models for a variety of hyper parameters and convergence should be better than the screenshots that you posted. Hyper parameter tuning for us mostly improved the sharpness of the predictions predictions, but the general direction of the illumination should be correct for most training runs.

hustliujian commented 4 years ago

@lmurmann Hello, I have trained the model of single illumination estimation.The L2Loss curve(finetune) is as follows, image It is about 0.01459 in training dataset and 0.01897 in test dataset. The test images are as follows: 1.Input image 2.Groundtruth image 3.Pred_result image Could you tell me your test results? And I have another question.Whether is the trained model used to render object in video? Whether it is stable or not will not sway?Thank you very much!

The L2Loss curve almost didn't decrease as you show, so is this result normal ? And I compute L2Loss on testset with weight provided by the author, the value is about 0.9, but your L2Loss is so small, so I want to know is there any difference between yours and the author's ?