Closed ypflll closed 5 years ago
@gulvarol Hi, Gul. I found the test list in namescmu.txt (although don't know how to get this). However, when I extract middle frame of these 507 videos, I also get some images that human body is out of image, and this will significantly impair the performance. For example, run1_104_21_c0004.mp4, I get mean distance error 2.34m in world coordinate! This single image will increase the final error more than 4mm considering there are only 507 test images. So, I wonder if you leave these images off when you got the result in your paper. If not, do you think add these iamges is reasonable? Thanks.
Hi, these 507 clips are already provided as the val
folder of the SURREAL dataset. At the time, I was not planning to use this dataset for evaluation, but just for training, so the dataset does not have proper train/val split. Here, the val corresponds to the middle clip of each test sequence to allow faster evaluation. The performance is quite similar to testing on the full test set. I don't remember the exact reason why it is less than 703, probably I took those sequences that are longer than some threshold. I see that among the 507, 493 of them were evaluated. Find the indices below that were used in the paper. These correspond to testno + 1
in the fit_surreal.py
file. The remaining must have crushed for some reason. I don't do anything special for those frames where the person is out of the frame, I report whatever the released code produces. Sorry for inconvenience.
[001,002,003,004,005,006,007,008,009,010,011,012,013,014,015,016,017,018,019,020,021,022,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,038,039,040,041,042,043,044,045,046,047,048,049,050,051,052,053,054,055,056,057,058,059,060,061,062,064,065,066,067,068,069,070,071,072,073,074,075,076,077,078,079,080,081,082,083,084,085,086,087,088,089,090,091,092,093,094,095,096,097,098,099,100,101,102,103,104,105,106,107,108,109,110,111,112,113,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,153,154,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,462,463,464,465,466,467,468,469,470,471,472,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,504,505,506]
Thanks for clearify this. Checked the list above. The 14 clips you discarded, most of which human body are out of the image boundary, and have very large error. Removing them makes the result 10mm better.
However, in several of these 493 clips' middle frame, most part of human body is still out of image, like: val\run0\ung_10_04\ung_10_04_c0002, frame34 These frames significantly impair the result, and I don't think it't meaningful to reconstruct 3d mesh when most body part is out of the image. So I suggest to make a better dataset split and a standard testset. Thanks.
Sure, you can define a better split.
Hi, Gul. I am using surreal testset to compare my results, but get a problem. As mentioned in your paper, testset has 30 sujects, 703 sequences, 12528 clips and rendered 3 times. When testing bodynet, you get 507 images using middle frame of the middle clip of each test sequence. Why it's 507 when you have 703 test sequences?
Another question is when I get middle frame of all 12528 clips, sometimes human body is out of image. Like run2/104_32_c0003:
I get wrong result when testing this image. How do you deal with image liake this? Or just discard this (I get 1046 pictures like this)?