jzhoubu / vsearch

An Extensible Framework for Retrieval-Augmented LLM Applications: Learning Relevance Beyond Simple Similarity.
MIT License
42 stars 1 forks source link

File for inference not found #5

Closed Clementine24 closed 4 months ago

Clementine24 commented 4 months ago

Hello,

Thank you for open-sourcing the code for this great work. I would like to try your quick start code for inference, but I encountered an error in the import module section. Based on the bug report, it seems that some files have missed. How should I resolve this issue?

Thank you very much for your attention to this matter.

Bug report:

ModuleNotFoundError Traceback (most recent call last) Cell In[4], line 2 1 import torch ----> 2 from src.vdr import Retriever 4 # Initialize the retriever 5 vdr_text2text = Retriever.from_pretrained("vsearch/vdr-nq")

File ~/project/VDR-master/src/vdr/init.py:1 ----> 1 from src.vdr.modeling.retriever.retriever import Retriever, RetrieverConfig

File ~/project/VDR-master/src/vdr/modeling/retriever/retriever.py:11 9 from ...utils.qa import has_answer 10 from ...data.biencoder_dataset import _normalize ---> 11 from ...index.base import Index 13 logger = logging.getLogger(name) 16 class RetrieverConfig(BiEncoderConfig):

ModuleNotFoundError: No module named 'src.vdr.index'

jzhoubu commented 4 months ago

Thank you for reporting the issue! We have identified that the error occurred due to the recent removal of the index module. This issue has been fixed now. Please feel free to try again, and let us know if you encounter any further issues.

Clementine24 commented 4 months ago

Thank you very much for your prompt response to the previous issue! The inference code in the quick_start file is now running smoothly.

I have followed the disentangling method for query embedding visualization in the cross_modal model, similar to the text2text model.

vdr_cross_modal = Retriever.from_pretrained("vsearch/vdr-cross-modal") # Note: encoder_p for images, encoder_q for text.

image_file = '../images/mars.png'
texts = [
    "Four thousand Martian days after setting its wheels in Gale Crater on Aug. 5, 2012, NASA’s Curiosity rover remains busy conducting exciting science. The rover recently drilled its 39th sample then dropped the pulverized rock into its belly for detailed analysis.",
    "ChatGPT is a chatbot developed by OpenAI and launched on November 30, 2022. Based on a large language model, it enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language."
]
image_emb = vdr_cross_modal.encoder_p.embed(image_file) # Shape: [1, V]
text_emb = vdr_cross_modal.encoder_q.embed(texts)  # Shape: [2, V]

# Image-text Relevance
scores = image_emb @ text_emb.t()
print(scores)

# tensor([[0.3209, 0.0984]])

disentanglement = vdr_cross_modal.encoder_q.dst(image_file, k=768, visual=True) # Generate a word cloud if `visual`=True
print(disentanglement)

# {'mars': 0.23402011394500732, 'german': 0.16442681849002838, 'strange': 0.12137473374605179, 'light': 0.10581395030021667, 'landscape': 0.1045181155204773, 'steel': 0.09839357435703278, 'graphs': 0.09210968017578125, 'remix': 0.09128298610448837, 'data': 0.08945354074239731, 'guidelines': 0.08771168440580368, 'filters': 0.08714032918214798, 'astronaut': 0.08688728511333466, 'test': 0.08608376979827881, '.': 0.0860578715801239, 'soft': 0.0851304903626442, 'martian': 0.08474336564540863, 'desert': 0.08458231389522552, 'bright': 0.08289938420057297, 'city': 0.08241231739521027, 'jupiter': 0.08211679756641388, 'image': 0.08204063773155212, 'gray': 0.08194388449192047, 'the': 0.07762885838747025, '##tically': 0.07689518481492996, 'night': 0.07577159255743027, 'love': 0.0738518014550209, 'own': 0.0727427676320076, 'architecture': 0.07260419428348541, 'alone': 0.07112161815166473, 'something': 0.07036653906106949, 'images': 0.07033450901508331, 'alien': 0.06987667828798294, 'sky': 0.06967131048440933, 'pattern': 0.0692824199795723, 'noise': 0.0690610483288765, 'skin': 0.06877890229225159, 'photography': 0.06786561012268066, 'page': 0.06778068095445633, 'another': 0.06671272218227386, 'to': 0.06401767581701279, 'columns': 0.0636846125125885, 'through': 0.06340207904577255, 'number': 0.06337990611791611, 'circle': 0.06329290568828583, 'modernist': 0.0632452443242073, 'answer': 0.06293407082557678, 'hospital': 0.06293286383152008, 'long': 0.06186148524284363, 'television': 0.06185596063733101, 'centre': 0.06142085790634155, 'stores': 0.061395395547151566, 'glass': 0.06137480586767197, 'sexy': 0.06067910045385361, 'close': 0.060572732239961624, 'eva': 0.060565296560525894, 'pages': 0.06031431257724762, 'paths': 0.06024845317006111, 'silver': 0.06004896014928818, 'lava': 0.05998194217681885, 'europe': 0.059908874332904816, 'color': 0.05973391979932785, 'created': 0.05960246920585632, 'curiosity': 0.0594881996512413, 'earth': 0.05946899205446243, 'kelly': 0.05943819507956505, 'cruiser': 0.05925264582037926, 'planes': 0.059177786111831665, 'boxes': 0.05881067365407944, 'sandstone': 0.058601412922143936, 'symbols': 0.057823412120342255, 'clear': 0.05697939172387123, 'over': 0.05668037012219429, 'second': 0.056591615080833435, 'constructing': 0.055948901921510696, 'up': 0.055610399693250656, 'adam': 0.055051445960998535, 'modeling': 0.054980795830488205, 'off': 0.054910700768232346, 'focus': 0.05489692836999893, 'china': 0.054696954786777496, 'sweet': 0.05466734617948532, 'summer': 0.054604534059762955, 'ghosts': 0.05373286083340645, 'vehicle': 0.05352823808789253, 'sim': 0.05306227132678032, '##o': 0.05297122150659561, 'of': 0.0529252253472805, 'lines': 0.05285448953509331, 'dirt': 0.05275730788707733, 'home': 0.05257409065961838, 'brand': 0.05235454812645912, 'dark': 0.052325353026390076, 'building': 0.05184951052069664, 'red': 0.051477111876010895, 'seeds': 0.05062364041805267, ',': 0.05061899125576019, 'mark': 0.04938440024852753, 'facades': 0.04913909360766411, 'door': 0.049043502658605576, 'sunset': 0.048832256346940994, 'array': 0.048586104065179825, 'laboratories': 0.0479971244931221, 'highway': 0.047721680253744125, 'computer': 0.04749509319663048, 'water': 0.04738689213991165, 'eyes': 0.047359712421894073, 'what': 0.046999700367450714, 'music': 0.046890661120414734, 'electricity': 0.04646341875195503, 'park': 0.04630615934729576, 'a': 0.046265359967947006, 'arizona': 0.04590126872062683, 'toys': 0.045808564871549606, 'lamps': 0.04580793157219887, 'speed': 0.04536983370780945, 'sugar': 0.04529138654470444, 'funny': 0.04512602463364601, 'art': 0.04510129988193512, 'late': 0.045050449669361115, 'galaxy': 0.04470088705420494, 'beautiful': 0.04432560130953789, 'distant': 0.04420972988009453, 'towards': 0.04413794353604317, 'teacher': 0.044128358364105225, 'interior': 0.04398759454488754, 'in': 0.043876390904188156, 'asteroids': 0.043457817286252975, 'de': 0.04325849190354347, 'flare': 0.042493898421525955, '##ng': 0.04237714409828186, 'hard': 0.04216272383928299, 'india': 0.042160890996456146, 'writing': 0.041779693216085434, 'rocks': 0.041566815227270126, 'teenager': 0.041488319635391235, 'hands': 0.04145174100995064, 'no': 0.04142449423670769, 'novels': 0.04140682891011238, 'solar': 0.04132654145359993, 'type': 0.04123372957110405, 'my': 0.04119125381112099, 'winter': 0.04110112413764, 'stations': 0.040981173515319824, 'life': 0.040935907512903214, 'trains': 0.040902525186538696, 'skyscraper': 0.04078696295619011, '9': 0.04072955995798111, 'greenhouse': 0.040627460926771164, 'flames': 0.04058598354458809, 'walk': 0.04043610021471977, 'she': 0.04024743288755417, 'garden': 0.040220167487859726, 'royal': 0.039981331676244736, 'iconic': 0.0399278923869133, 'land': 0.039896946400403976, 'explosion': 0.03968171402812004, '##books': 0.039424143731594086, 'suns': 0.03920156881213188, 'woman': 0.039152491837739944, 'your': 0.03912977874279022, 'beauty': 0.03908013552427292, 'la': 0.03899979591369629, 'tenants': 0.038889165967702866, 'design': 0.038726601749658585, 'antenna': 0.03869028016924858, 'newspaper': 0.038629136979579926, 'built': 0.03850821778178215, '-': 0.038317300379276276, 'win': 0.03829587250947952, 'nice': 0.03804556280374527, 'surfaces': 0.03762289509177208, 'b': 0.03762213513255119, 'phone': 0.037619754672050476, 'plains': 0.037574488669633865, 'peninsula': 0.03744557127356529, 'ferries': 0.03743363171815872, 'motors': 0.03731761500239372, 'little': 0.037299592047929764, 'workplace': 0.03715487942099571, 'wide': 0.0371004082262516, 'magazine': 0.03703739866614342, 'wood': 0.03693687915802002, 'sunglasses': 0.03692129999399185, 'screens': 0.03690403327345848, 'storms': 0.036675769835710526, 'voice': 0.03638068959116936, 'talk': 0.03634042292833328, '##page': 0.036306340247392654, 'lips': 0.03617602586746216, 'glitter': 0.03613557666540146, 'rainbow': 0.03612335026264191, 'street': 0.03606003150343895, 'radar': 0.036016929894685745, 'sign': 0.03575138747692108, 'kate': 0.035750243812799454, 'police': 0.03574397414922714, 'footprints': 0.035692717880010605, 'blocks': 0.03540953993797302, 'minimal': 0.03539125621318817, 'numbers': 0.0353628545999527, 'published': 0.035338759422302246, 'palms': 0.03520473837852478, 'president': 0.03514966368675232, 'trees': 0.03500761836767197, 'food': 0.03494328260421753, 'fast': 0.034920357167720795, 'portico': 0.03484898433089256, 'channel': 0.03478921949863434, 'me': 0.03477315604686737, ':': 0.03468853607773781, 'wildlife': 0.034653328359127045, 'shore': 0.034638360142707825, 'passport': 0.03459177166223526, 'live': 0.034493643790483475, 'impressive': 0.034173447638750076, 'shot': 0.03416561707854271, 'day': 0.03411514312028885, 'residential': 0.034111592918634415, 'sun': 0.03391562029719353, 'interview': 0.03389817848801613, 'from': 0.03387217968702316, 'smile': 0.03382942080497742, 'valleys': 0.03376289829611778, 'final': 0.033445801585912704, 'daughter': 0.03333960846066475, 'babies': 0.03329978883266449, 'covering': 0.03327321261167526, 'word': 0.032989855855703354, 'farmland': 0.03298003971576691, 'laboratory': 0.03292936086654663, 'buildings': 0.032778963446617126, 'temples': 0.0326603464782238, 'blossoms': 0.03242591768503189, 'standing': 0.03240366652607918, 'sleeping': 0.03219300135970116, 'thing': 0.03217542916536331, 'nave': 0.03215274214744568, 'just': 0.03211737424135208, 'hot': 0.03211153298616409, 'trip': 0.032072555273771286, 'grass': 0.031964316964149475, 'see': 0.03194059804081917, 'force': 0.031671538949012756, 'rest': 0.03154302015900612, 'sites': 0.03146794065833092, 'offices': 0.03129897266626358, 'mirrored': 0.031042974442243576, 'malls': 0.03102550283074379, 'cities': 0.03091946244239807, 'men': 0.03091093711555004, 'meeting': 0.030802950263023376, 'skiing': 0.030796311795711517, 'cups': 0.03070848435163498, 'tall': 0.030677825212478638, 'james': 0.030670594424009323, 'new': 0.03066241927444935, 'council': 0.030262593179941177, 'work': 0.030237145721912384, 'son': 0.03019575960934162, 'her': 0.03015523962676525, 'landscapes': 0.03015047498047352, 'white': 0.030128974467515945, 'instrument': 0.029925579205155373, 'triangle': 0.029748711735010147, 'jessica': 0.029693281278014183, 'march': 0.029648052528500557, 'colors': 0.029510285705327988, 'animal': 0.029453687369823456, '##a': 0.029424557462334633, 'on': 0.029390031471848488, 'contrast': 0.029340384528040886, 'down': 0.02926803007721901, 'alpine': 0.029128924012184143, 'terra': 0.029116498306393623, '##t': 0.028634848073124886, 'lunar': 0.028584687039256096, 'mineral': 0.028449647128582, 'and': 0.028441088274121284, 'neighborhood': 0.028370581567287445, 'forest': 0.028185652568936348, 'disco': 0.02808990329504013, 'sheep': 0.028082363307476044, 'dare': 0.028012501075863838, 'sitting': 0.027899248525500298, 'wind': 0.027881167829036713, 'this': 0.027531588450074196, 'fitted': 0.027512283995747566, 'staring': 0.027413025498390198, 'shift': 0.027286848053336143, 'flew': 0.026944253593683243, 'laptop': 0.026732079684734344, 'conversation': 0.0267232283949852, 'vegetation': 0.02658265270292759, 'expo': 0.026467440649867058, 'stone': 0.02633984014391899, 'addition': 0.02629685401916504, 'at': 0.02628813311457634, 'cleaning': 0.02617654949426651, '##y': 0.026085935533046722, 'valley': 0.02587367407977581, 'sunlight': 0.025801507756114006, 'wheat': 0.02572021260857582, 'date': 0.025635868310928345, 'film': 0.025506943464279175, 'not': 0.02549441158771515, 'nasa': 0.025312263518571854, 'shape': 0.025276221334934235, 'cameras': 0.025268448516726494, 'groups': 0.02520405501127243, 'celebration': 0.0251747015863657, 'for': 0.02513519488275051, 'honey': 0.025094376876950264, 'name': 0.025003938004374504, 'senior': 0.02500328980386257, 'entrance': 0.024900395423173904, 'roads': 0.02474856749176979, 'balls': 0.024695925414562225, 'maps': 0.02467319741845131, 'globe': 0.024653110653162003, 'chatting': 0.024623990058898926, 'these': 0.02461301162838936, 'egypt': 0.024559402838349342, 'decorated': 0.024558238685131073, '##2': 0.02455565705895424, 'flower': 0.024527674540877342, 'transport': 0.024526936933398247, 'black': 0.02451336942613125, 'input': 0.02446603961288929, 'pair': 0.02431299351155758, 'female': 0.02399502880871296, 'book': 0.02398073486983776, 'wound': 0.023970363661646843, 'italy': 0.023887690156698227, 'blinds': 0.023856421932578087, 'expanse': 0.023608287796378136, 'greenland': 0.023508412763476372, 'construction': 0.023506198078393936, 'right': 0.023486116901040077, 'wild': 0.0234490055590868, 'moon': 0.02332276664674282, 'small': 0.023251762613654137, 'israeli': 0.023224804550409317, 'flashlight': 0.023212216794490814, 'beef': 0.023195821791887283, 'novel': 0.023185281082987785, 'area': 0.023031039163470268, 'group': 0.022823844105005264, 'petrol': 0.022803030908107758, '##ched': 0.022777589038014412, 'sticks': 0.022705333307385445, 'young': 0.02257453463971615, 'forum': 0.02256106585264206, 'london': 0.022515607997775078, 'joe': 0.0224336925894022, 'text': 0.02242666855454445, 'dense': 0.022400736808776855, 'covers': 0.02238554321229458, 'running': 0.022315960377454758, 'face': 0.022133484482765198, 'interstate': 0.02207862213253975, 'facility': 0.021913476288318634, 'czech': 0.021909011527895927, 'jump': 0.02190580405294895, 'engine': 0.021824119612574577, 'comics': 0.021812252700328827, 'manor': 0.021767526865005493, 'controls': 0.021743467077612877, 'foot': 0.02171117253601551, 'telescope': 0.02170446887612343, 'resident': 0.021678315475583076, 'set': 0.02159399352967739, 'external': 0.021562332287430763, 'cracked': 0.021498803049325943, 'uniform': 0.021457383409142494, 'bestseller': 0.021402457728981972, 'islands': 0.02134004794061184, 'surgery': 0.021281305700540543, 'king': 0.021115761250257492, 'spheres': 0.021107187494635582, 'led': 0.02104976586997509, 'swirl': 0.02100660838186741, 'japan': 0.020999297499656677, 'busy': 0.020922526717185974, 'refinery': 0.02085566520690918, 'brown': 0.020834388211369514, 'volcano': 0.020822694525122643, 'hallway': 0.020762328058481216, 'geology': 0.020609527826309204, 'rock': 0.020564915612339973, 'seating': 0.020548444241285324, 'industrial': 0.020536433905363083, 'best': 0.020511452108621597, 'innocent': 0.020317498594522476, 'books': 0.020307743921875954, 'concerts': 0.020300069823861122, 'deserts': 0.020296610891819, 'studying': 0.020291008055210114, 'first': 0.020257942378520966, 'wool': 0.02024562656879425, 'barely': 0.020122839137911797, 'subway': 0.020068250596523285, 'device': 0.01996430568397045, 'warm': 0.01989051140844822, 'grains': 0.01985350251197815, 'he': 0.019847191870212555, 'homework': 0.01983356848359108, 'river': 0.019779672846198082, 'monitors': 0.019777841866016388, 'evening': 0.01977716200053692, 'formation': 0.019693929702043533, 'con': 0.0196083877235651, 'train': 0.01959194242954254, 'transmitter': 0.019487569108605385, 'engines': 0.019454237073659897, 'thanks': 0.019410287961363792, 'van': 0.01940869353711605, 'hair': 0.019307555630803108, 'astronomers': 0.019291378557682037, 'parents': 0.019251074641942978, 'airbus': 0.01923845335841179, 'southern': 0.01918417029082775, 'sleep': 0.019180677831172943, '##r': 0.019157016649842262, 'palm': 0.019133897498250008, 'lasers': 0.019117850810289383, 'stars': 0.019061116501688957, 'modelling': 0.019060343503952026, 'avenue': 0.019056908786296844, 'rocket': 0.019037507474422455, 'survey': 0.019021200016140938, 'daniel': 0.018977900967001915, 'ring': 0.018942169845104218, 'barren': 0.018926473334431648, 'east': 0.01888135075569153, 'states': 0.01884542405605316, 'ipod': 0.01880199834704399, 'winters': 0.018754146993160248, 'time': 0.018726926296949387, 'curtains': 0.018711021170020103, 'ruins': 0.018703805282711983, 'trails': 0.018625209107995033, 'an': 0.018581613898277283, 'gas': 0.018505726009607315, 'plasma': 0.018482910469174385, 'flowers': 0.018441030755639076, 'atlas': 0.018440714105963707, 'band': 0.018380463123321533, 'field': 0.018332960084080696, 'after': 0.018319394439458847, 'machines': 0.01831834763288498, 'they': 0.01826302893459797, 'apples': 0.01819789409637451, 'i': 0.01814705692231655, 'stairwell': 0.018028153106570244, 'carlos': 0.018016956746578217, 'blue': 0.01797725446522236, 'allie': 0.017966827377676964, 'typed': 0.017959481105208397, 'girls': 0.017935926094651222, 'wetland': 0.017894573509693146, 'garbage': 0.017874224111437798, 'site': 0.01785130240023136, 'hats': 0.017780158668756485, 'information': 0.017771773040294647, 'g': 0.017696410417556763, 'infrared': 0.017669036984443665, 'flame': 0.017635123804211617, 'statistics': 0.017614996060729027, 'out': 0.017579784616827965, 'buses': 0.017567545175552368, 'leaving': 0.01752491481602192, 'smoke': 0.01750955544412136, 'indian': 0.0174952894449234, 'apartments': 0.017424514517188072, 'passing': 0.017390277236700058, 'lights': 0.017353788018226624, 'illinois': 0.01733563467860222, 'knew': 0.017304375767707825, 'go': 0.017302244901657104, 'hillside': 0.017255738377571106, 'winding': 0.017227090895175934, 'dome': 0.01718240976333618, 'finger': 0.01716688647866249, 'writer': 0.0171486996114254, 'texts': 0.017131926491856575, 'straight': 0.017124446108937263, 'sea': 0.017118094488978386, 'cells': 0.017106924206018448, 'sailing': 0.017080048099160194, 'muscle': 0.01707574538886547, 'toilets': 0.0170179083943367, 'found': 0.016991397365927696, 'concrete': 0.01698746345937252, 'died': 0.016939615830779076, 'flag': 0.016899993643164635, 'computing': 0.01683579944074154, 'dwelling': 0.01682058349251747, 'aliens': 0.016813265159726143, 'friends': 0.016805676743388176, 'experimental': 0.0167095810174942, 'creates': 0.016705244779586792, 'russia': 0.01670357584953308, 'bible': 0.01669989712536335, 'path': 0.01665021851658821, 'illustration': 0.016515789553523064, 'its': 0.016512688249349594, '##e': 0.016469748690724373, 'broken': 0.01643780991435051, 'ins': 0.016429755836725235, 'mountains': 0.016429657116532326, 'church': 0.016344841569662094, '##ung': 0.01632777415215969, 'texture': 0.01627299003303051, 'clouds': 0.016253873705863953, 'egyptian': 0.01624104008078575, 'roman': 0.016186853870749474, 'villages': 0.016178306192159653, 'made': 0.016173645853996277, 'orange': 0.016171472147107124, 'birthday': 0.01605742797255516, 'speeds': 0.01605110801756382, 'envelope': 0.01596388965845108, 'solution': 0.015954066067934036, '5': 0.01590071991086006, 'clocks': 0.015843508765101433, 'green': 0.01583705097436905, 'roots': 0.0158089529722929, 'sparkling': 0.015803208574652672, 'pueblo': 0.015778763219714165, 'david': 0.015768595039844513, 'swimming': 0.01574060134589672, 'sale': 0.015738293528556824, 'medical': 0.01571236550807953, 'phoenix': 0.015659870579838753, 'pyramid': 0.015656230971217155, 'golden': 0.015621555969119072, 'gardener': 0.01556569617241621, 'auditorium': 0.015559639781713486, 'pants': 0.015525753609836102, 'wave': 0.015513562597334385, 'south': 0.015510049648582935, 'training': 0.01549061480909586, 'casino': 0.015435602515935898, 'apple': 0.015334682539105415, 'venus': 0.015323818661272526, 'cotton': 0.015267428942024708, 'power': 0.015263699926435947, 'pipes': 0.015135928057134151, 'twisted': 0.01504134014248848, 'around': 0.015037700533866882, 'exam': 0.014930550940334797, 'tube': 0.01493009366095066, 'playground': 0.014916257932782173, 'chicago': 0.014902296476066113, 'committee': 0.014867933467030525, 'e': 0.014852894470095634, 'shows': 0.014832571148872375, 'climbing': 0.014830067753791809, 'racetrack': 0.014758585020899773, 'meal': 0.014715763740241528, 'canyon': 0.014712681993842125, 'terrain': 0.014701255597174168, 'bobby': 0.014693457633256912, 'bookstore': 0.014659816399216652, 'military': 0.014654104597866535, 'state': 0.014624851755797863, 'is': 0.014619821682572365, 'governmental': 0.014569004066288471, 'french': 0.014533921144902706, 'heavens': 0.014526253566145897, 'fire': 0.014513863250613213, 'next': 0.014495511539280415, 'you': 0.014465127140283585, 'purse': 0.01446300558745861, 'gay': 0.014456101693212986, 'school': 0.014419387094676495, 'kid': 0.014415224082767963, 'explore': 0.014403758570551872, 'empty': 0.014376136474311352, 'their': 0.014297276735305786, 'written': 0.014288884587585926, 'bus': 0.014276747591793537, 'documents': 0.014214527793228626, 'salt': 0.014204912818968296, 'shops': 0.0141731146723032, 'reactor': 0.01409945823252201, 'cards': 0.014054492115974426, 'cooked': 0.01400679349899292, 'shooting': 0.013989444822072983, 'tokyo': 0.01391876582056284, 'songs': 0.013876664452254772, 'bird': 0.013869404792785645, 'cup': 0.013840035535395145, 'reading': 0.013797158375382423, 'illustrations': 0.013789698481559753, 'mhz': 0.013785464689135551, 'singapore': 0.013744637370109558, 'signage': 0.013722357340157032, 'interesting': 0.01372167095541954, 'closed': 0.01371072605252266, 'bikes': 0.013695324771106243, 'charcoal': 0.01367360632866621, 'american': 0.013670061714947224, 'wrist': 0.013664551079273224, '##fire': 0.01366092637181282, 'it': 0.013616424985229969, 'blur': 0.0136055126786232, 'riders': 0.013579375110566616, 'radio': 0.01350681483745575, 'gold': 0.01350395753979683, 'vamp': 0.013469698838889599, 'into': 0.01346607692539692, 'hit': 0.01345121394842863, 'awards': 0.01343684270977974, 'posters': 0.013428810052573681, 'reflects': 0.013360799290239811, 'peaks': 0.013290847651660442, 'club': 0.013273625634610653, 'us': 0.013266284950077534, 'gloom': 0.013219066895544529, 'friend': 0.01319130789488554, 'gu': 0.01318750623613596, 'estadio': 0.01317092776298523, 'sink': 0.01316932961344719, 'mast': 0.013154320418834686, 'junction': 0.013149083591997623, 'hills': 0.013136595487594604, 'vessel': 0.013105479069054127, 'crystal': 0.01308872364461422, 'sings': 0.013081945478916168, 'articles': 0.013058534823358059, 'exposure': 0.013031113892793655, 'storm': 0.013023899868130684, 'baby': 0.012957978993654251, '##ia': 0.012957925908267498, 'families': 0.01295551285147667, 'dramatic': 0.012940180487930775, 'head': 0.012912039645016193, 'bliss': 0.012904993258416653, 'races': 0.01289819460362196, '##ing': 0.012894283048808575, 'farm': 0.012886574491858482, 'paintings': 0.012849200516939163, 'or': 0.012842826545238495, 'laid': 0.0128097552806139, 'museums': 0.012789065018296242, 'bees': 0.012781757861375809, 'h': 0.012771910056471825, '##m': 0.012768971733748913, 'glasses': 0.012757259421050549, 'balanced': 0.012748748995363712, 'grid': 0.012739974074065685, 'complex': 0.012727586552500725, 'by': 0.012708392925560474, 'fleet': 0.012701979838311672, 'structure': 0.012611104175448418, 'constellation': 0.01259595062583685, 'good': 0.012589750811457634, 'camera': 0.012573628686368465, 'stockholm': 0.01256292499601841, 'score': 0.01255599781870842, 'sudan': 0.012509748339653015, 'letters': 0.012494265101850033, 'flying': 0.012493157759308815, 'nail': 0.01248058769851923, 'pitch': 0.012477606534957886, 'marker': 0.012458102777600288, 'ship': 0.012441476806998253, 'lighted': 0.012410880997776985, 'coin': 0.01240119244903326, 'pure': 0.012382188811898232, 'reds': 0.012337662279605865, 'people': 0.012330206111073494, 'rebuilt': 0.012325502000749111, 'rails': 0.012311865575611591, 'court': 0.012299077585339546, 'with': 0.012297313660383224, 'bands': 0.012284955941140652, 'trail': 0.012281125411391258, 'hand': 0.012265644036233425, 'youth': 0.012265557423233986, 'exploring': 0.012229752726852894, 'together': 0.012228380888700485, 'accounts': 0.012226208113133907, 'glowed': 0.01221870444715023, 'infinity': 0.012197018601000309, 'illustrator': 0.01215285249054432, 'libraries': 0.012145832180976868, '##dial': 0.012138773687183857, 'picture': 0.012113059870898724, 'premises': 0.012110273353755474, 'panels': 0.012103904038667679, 'lenses': 0.012101112864911556, '##er': 0.012094092555344105, 'magazines': 0.012084420770406723, 'view': 0.012078442610800266, 'destroyed': 0.012030408717691898, 'bishop': 0.012018187902867794, 'expanded': 0.011952213011682034, 'younger': 0.01189960353076458, 'freeway': 0.011836639605462551, 'peak': 0.011815783567726612, 'space': 0.011804696172475815, 'backseat': 0.01179217267781496, 'back': 0.011735866777598858, 'our': 0.011728242039680481, 'met': 0.011707865633070469, 'uniforms': 0.011691398918628693, 'marriage': 0.011682759039103985, 'foods': 0.011669676750898361, 'africa': 0.011650213971734047, 'suite': 0.011646266095340252, 'furniture': 0.011638885363936424, 'smart': 0.011617830023169518, 'depicts': 0.011557526886463165, 'arrangement': 0.011525571346282959, 'platforms': 0.011487352661788464, 'mirror': 0.011434506624937057, 'experiments': 0.011432684026658535, 'policeman': 0.011424041353166103, 'dog': 0.011423584073781967, 'too': 0.011423190124332905, 'redhead': 0.011415420100092888, 'traffic': 0.011392551474273205, 'flow': 0.01138256210833788, 'mcdonald': 0.01137094758450985, 'show': 0.011366966180503368, 'maize': 0.011343753896653652, 'coloured': 0.01133289746940136, 'fitness': 0.011326707899570465, 'glacier': 0.011326165869832039, 'draft': 0.011315543204545975, 'constructed': 0.011290974915027618, 'pointing': 0.011268377304077148, 'views': 0.011254318989813328, 'christians': 0.011240947991609573, 'herd': 0.01118269469588995, 'rice': 0.011161146685481071, 'lab': 0.011159740388393402, 'nipple': 0.011145343072712421, 'taking': 0.011128343641757965, 'scalp': 0.011128200218081474, 'windshield': 0.011077904142439365, 'waving': 0.011065851897001266, 'graves': 0.011053146794438362, 'fields': 0.011035308241844177, 'destroyer': 0.01100898440927267, 'warning': 0.011004449799656868, 'fiber': 0.010990029200911522, 'seeing': 0.0109693119302392, 'original': 0.010956460610032082, 'writes': 0.010950960218906403, 'guide': 0.010938023217022419, 'further': 0.010922165587544441, 'boys': 0.010913211852312088, '##creen': 0.010887783020734787, 'we': 0.010869347490370274, 'top': 0.01086364220827818, '##screen': 0.01081862859427929, 'radiation': 0.010816633701324463, 'living': 0.010767914354801178, 'spray': 0.010763196274638176, '##work': 0.010756940580904484, 'how': 0.010746831074357033, 'pregnant': 0.010741638951003551, 'decided': 0.01073278859257698, 'government': 0.010714204981923103, 'tropical': 0.010708420537412167}

However, the results obtained are different from those shown in the README.

image

Additionally, the word cloud contains many irrelevant tokens such as 'german', 'strange', 'light', etc. Could there be any issues with the disentangling code I am using for the cross_modal model?

Below are the word cloud visualization results for another example images, for your reference: moto.png: image

# {'motorcycle': 0.19214993715286255, 'images': 0.14219743013381958, 'created': 0.1317986398935318, 'german': 0.13101552426815033, 'mo': 0.12511393427848816, 'photography': 0.11549688875675201, '##ng': 0.11132083833217621, 'gray': 0.10592780262231827, 'sim': 0.10203593224287033, 'modeling': 0.09780745953321457, 'brand': 0.0959356427192688, 'europe': 0.09577344357967377, 'dirt': 0.09513884782791138, 'test': 0.09234508872032166, 'pages': 0.09030099958181381, 'sexy': 0.08675366640090942, 'remix': 0.08440207690000534, 'wide': 0.0832391008734703, 'strange': 0.08230818063020706, 'steel': 0.08197955042123795, 'champion': 0.08196212351322174, 'jump': 0.08170567452907562, 'china': 0.0797039195895195, 'data': 0.07912503182888031, 'focus': 0.07903577387332916, 'city': 0.07871531695127487, 'surfing': 0.07840601354837418, 'resident': 0.07788623124361038, 'de': 0.07702414691448212, 'india': 0.07653310149908066, 'image': 0.07615304738283157, 'japan': 0.07592455297708511, 'love': 0.07442747801542282, 'cup': 0.07378575950860977, 'page': 0.0732511579990387, 'pro': 0.07324665784835815, 'number': 0.07298716902732849, 'jumping': 0.07207448780536652, 'teenager': 0.07063709944486618, 'shoes': 0.07039686292409897, 'raced': 0.06941156834363937, 'screens': 0.06864826381206512, 'another': 0.06841610372066498, '-': 0.0683770552277565, 'soft': 0.0675305500626564, 'villages': 0.06728583574295044, '.': 0.0657622367143631, 'the': 0.06565755605697632, 'trails': 0.06542310863733292, 'beauty': 0.06524190306663513, 'son': 0.06483852863311768, 'skate': 0.06471055001020432, 'sky': 0.0647059977054596, 'something': 0.06461930274963379, 'hard': 0.06437405943870544, 'hospital': 0.06419741362333298, 'southern': 0.0641649141907692, 'smile': 0.06359004974365234, '##o': 0.06217104569077492, 'architecture': 0.061484407633543015, 'youth': 0.06075466796755791, 'protect': 0.060580406337976456, 'light': 0.06055312603712082, 'pattern': 0.059031762182712555, 'vamp': 0.05861247330904007, 'teacher': 0.05856887251138687, 'win': 0.058388736099004745, 'skin': 0.058378107845783234, 'cycles': 0.05828609690070152, 'landscape': 0.058059681206941605, 'la': 0.05776188522577286, 'planes': 0.05767054110765457, 'singapore': 0.05755936726927757, 'highway': 0.057162102311849594, 'mark': 0.05628189444541931, 'brave': 0.05626475065946579, 'guidelines': 0.05612916499376297, 'celebration': 0.05581573024392128, 'summer': 0.0555352121591568, 'street': 0.054740648716688156, 'bike': 0.05458524078130722, 'lips': 0.05448034033179283, 'funny': 0.05405861511826515, '##r': 0.05370410531759262, 'skiing': 0.053339794278144836, 'races': 0.05287234112620354, 'own': 0.052578918635845184, 'mag': 0.052440155297517776, 'thanks': 0.05231965333223343, 'biker': 0.05227531120181084, 'building': 0.05181792378425598, 'medical': 0.051548078656196594, 'stone': 0.0514264814555645, 'buildings': 0.05062192678451538, 'day': 0.050512850284576416, 'final': 0.05021118372678757, 'runway': 0.05006387084722519, 'new': 0.049858350306749344, 'life': 0.049355048686265945, '##ike': 0.049268368631601334, 'shore': 0.0490327812731266, 'modelling': 0.04885286092758179, 'riders': 0.04860991612076759, 'roads': 0.04787522181868553, 'second': 0.0478598028421402, 'clubs': 0.047825008630752563, 'off': 0.04779557138681412, 'water': 0.047494806349277496, 'stores': 0.04736252874135971, 'marriage': 0.04708252102136612, 'wood': 0.04701148346066475, 'music': 0.0468100905418396, 'sites': 0.04653606191277504, 'female': 0.04598429426550865, 'forest': 0.04574466869235039, 'shot': 0.045556310564279556, 'forum': 0.04524679109454155, 'beautiful': 0.045051585882902145, 'journey': 0.04498792067170143, 'she': 0.044984813779592514, 'on': 0.044934190809726715, 'to': 0.044896453619003296, 'daughter': 0.04482080787420273, 'rodeo': 0.04471568763256073, 'glitter': 0.044680237770080566, 'resort': 0.044644664973020554, 'expo': 0.04463270679116249, 'me': 0.044403329491615295, 'noise': 0.04436328634619713, 'my': 0.04400146007537842, 'peninsula': 0.043947748839855194, 'kelly': 0.043824899941682816, 'drift': 0.04367733374238014, 'speed': 0.04367073252797127, 'vehicle': 0.043490778654813766, 'winter': 0.04307040199637413, 'cities': 0.04290056601166725, 'circle': 0.04278441518545151, 'carlos': 0.04248139634728432, 'film': 0.042405422776937485, 'bicycles': 0.042402952909469604, 'built': 0.04217683523893356, 'answer': 0.04206503927707672, 'adam': 0.04197743907570839, 'impressive': 0.04169386252760887, 'tire': 0.041650641709566116, 'slalom': 0.04150165617465973, 'sandstone': 0.04117858409881592, 'alone': 0.040788061916828156, 'formation': 0.04076511785387993, '##ing': 0.04075249284505844, 'boat': 0.04065458104014397, 'royal': 0.040595732629299164, 'centre': 0.04025128856301308, 'temples': 0.039719149470329285, 'home': 0.0396411158144474, 'wine': 0.039608411490917206, 'wound': 0.03923806920647621, 'color': 0.039131149649620056, 'decorated': 0.03899906948208809, 'senior': 0.038639362901449203, 'force': 0.038618847727775574, '##to': 0.03792835399508476, 'after': 0.03788870573043823, 'live': 0.037530314177274704, 'trip': 0.037188321352005005, 'magazine': 0.037124667316675186, 'park': 0.03702264651656151, 'through': 0.0369219072163105, 'muscle': 0.036840882152318954, 'groom': 0.03681603819131851, 'rallies': 0.036706358194351196, 'mirror': 0.036540620028972626, 'group': 0.036507394164800644, 'sitting': 0.03648827224969864, 'flows': 0.036326516419649124, 'ro': 0.036284539848566055, 'electricity': 0.03602343052625656, 'racing': 0.035772912204265594, 'kong': 0.035728562623262405, 'mustang': 0.03570357337594032, 'stunt': 0.035692933946847916, 'islands': 0.03566957265138626, '##er': 0.035632554441690445, 'text': 0.03559718281030655, 'work': 0.035547737032175064, 'military': 0.03552808240056038, 'articles': 0.03528420254588127, 'australia': 0.03526008874177933, 'gaming': 0.03522680699825287, 'towards': 0.03520148992538452, 'field': 0.03505094721913338, 'newspaper': 0.0350499302148819, 'friend': 0.03488633781671524, 'coaster': 0.03487618267536163, 'facility': 0.034798331558704376, 'groups': 0.03476154804229736, 'auditorium': 0.03475860506296158, 'jessica': 0.0345534048974514, 'burkina': 0.03450459986925125, 'young': 0.03448347747325897, 'london': 0.03393888473510742, 'eva': 0.03389455005526543, 'time': 0.03366893157362938, 'type': 0.033637840300798416, 'pair': 0.03363659977912903, 'food': 0.03351951017975807, 'president': 0.03347884118556976, '##a': 0.03335064649581909, 'nice': 0.03326261416077614, 'sunglasses': 0.032744020223617554, 'sweet': 0.03260355442762375, 'men': 0.032498568296432495, 'interview': 0.032400015741586685, 'long': 0.03231329843401909, 'glider': 0.032254382967948914, 'tournaments': 0.031944457441568375, 'mediterranean': 0.031929586082696915, 'modernist': 0.03175363689661026, 'desert': 0.031414907425642014, 'single': 0.03140967711806297, 'surf': 0.031216919422149658, 'farm': 0.03105226345360279, 'choir': 0.03102019801735878, 'foot': 0.030988218262791634, 'door': 0.030790483579039574, 'cover': 0.030689772218465805, 'lines': 0.03048785775899887, 'training': 0.030412273481488228, 'baby': 0.03039363957941532, 'of': 0.030330728739500046, 'close': 0.030301408842206, 'hair': 0.03023497946560383, 'perched': 0.030091384425759315, 'growth': 0.029937153682112694, 'mountains': 0.029923802241683006, 'up': 0.029862483963370323, 'warrior': 0.029622215777635574, 'finals': 0.029545381665229797, 'fighting': 0.02948414906859398, 'africa': 0.029478825628757477, 'barely': 0.02943694032728672, 'award': 0.028947459533810616, 'just': 0.02894475869834423, 'breeding': 0.028822628781199455, 'sleeping': 0.02859259396791458, 'night': 0.028506102040410042, 'autumn': 0.02834680862724781, 'van': 0.02821502834558487, 'design': 0.028122764080762863, 'motorcycles': 0.02810324728488922, 'ring': 0.02808431163430214, 'marathon': 0.028007084503769875, 'friends': 0.027944980189204216, 'designs': 0.027848806232213974, '##page': 0.027701109647750854, 'fence': 0.027590962126851082, 'survey': 0.027527445927262306, 'silver': 0.02746395207941532, 'wrist': 0.02745034731924534, 'java': 0.02719190903007984, 'see': 0.027040859684348106, 'bangkok': 0.02698548138141632, 'wild': 0.02698538824915886, 'what': 0.026878369972109795, 'clouds': 0.026843996718525887, 'statistics': 0.02677232027053833, 'speech': 0.026719799265265465, 'area': 0.02671760693192482, 'little': 0.02667716145515442, '##igraphy': 0.026602081954479218, 'persons': 0.026596076786518097, 'qualifying': 0.026527859270572662, 'czech': 0.026515120640397072, 'joe': 0.02649211511015892, 'shoe': 0.026463337242603302, 'rides': 0.02644595503807068, 'tracking': 0.026399772614240646, 'else': 0.026298360899090767, 'egypt': 0.026211021468043327, 'face': 0.02620883099734783, 'meets': 0.026147399097681046, 'winters': 0.026039153337478638, 'busy': 0.02603684552013874, 'start': 0.026029352098703384, 'their': 0.026028046384453773, 'pictures': 0.02599869668483734, 'television': 0.02588104084134102, 'kite': 0.02550981380045414, 'pass': 0.02543158084154129, 'registration': 0.02518000267446041, 'graph': 0.025110485032200813, 'crashed': 0.024921903386712074, 'italy': 0.024758489802479744, 'riding': 0.0244978666305542, 'crowd': 0.024487636983394623, 'zebra': 0.02448057010769844, 'writer': 0.02447158843278885, 'mindanao': 0.024336976930499077, 'painted': 0.02419690415263176, 'metal': 0.024183988571166992, 'boys': 0.024025047197937965, 'eyes': 0.023973898962140083, 'document': 0.023822462186217308, '##tically': 0.023783594369888306, 'valley': 0.023630911484360695, 'uniform': 0.023625943809747696, 'king': 0.02358976937830448, 'we': 0.023504290729761124, 'track': 0.023453140631318092, 'club': 0.023399848490953445, 'hd': 0.023389656096696854, 'cleaning': 0.023302266374230385, 'over': 0.023239891976118088, 'games': 0.0232244785875082, 'para': 0.02321302518248558, 'sand': 0.023184683173894882, 'aluminum': 0.023123839870095253, 'swimming': 0.023117372766137123, '##oot': 0.02302316203713417, 'european': 0.02300228923559189, '##y': 0.02287578582763672, 'construction': 0.0227555800229311, 'store': 0.022748315706849098, 'media': 0.022638395428657532, 'side': 0.022587260231375694, 'hunting': 0.022567206993699074, 'coast': 0.022445783019065857, 'suv': 0.02236005663871765, 'for': 0.0223303884267807, 'supporter': 0.02232874184846878, 'empty': 0.02230333909392357, 'talk': 0.022147784009575844, 'alpine': 0.02210904285311699, 'constructing': 0.022009938955307007, 'rest': 0.021973293274641037, 'run': 0.0219485592097044, 'council': 0.02186267450451851, 'school': 0.021838078275322914, 'top': 0.02179340273141861, 'graffiti': 0.021728387102484703, 'score': 0.021555673331022263, 'ipod': 0.02153007872402668, 'land': 0.02150307036936283, 'all': 0.021481914445757866, 'flower': 0.021426962688565254, 'sugar': 0.02142559364438057, 'arrows': 0.021407386288046837, 'bright': 0.021396758034825325, 'grey': 0.021361086517572403, 'wedding': 0.021281681954860687, 'band': 0.021116185933351517, 'a': 0.021094363182783127, 'versus': 0.02107076905667782, 'interior': 0.02092820778489113, 'fashion': 0.020879892632365227, 'down': 0.02079424448311329, 'drives': 0.02077545039355755, 'art': 0.020761094987392426, 'circuits': 0.020732125267386436, 'automobile': 0.020677195861935616, 'wight': 0.02062208764255047, 'jet': 0.02061646059155464, 'climbing': 0.020596792921423912, 'caribbean': 0.020535029470920563, 'hallway': 0.020422227680683136, 'concerts': 0.020377153530716896, 'contestants': 0.0203679371625185, 'daniel': 0.020358363166451454, 'lists': 0.020353732630610466, 'dare': 0.020317649468779564, 'allie': 0.020283756777644157, 'convent': 0.02028234861791134, 'wave': 0.020252281799912453, 'smoke': 0.020231259986758232, 'lauderdale': 0.020222336053848267, 'everest': 0.02021915838122368, 'auto': 0.02016119658946991, 'covering': 0.020038733258843422, 'arrangements': 0.020034682005643845, 'artwork': 0.020027300342917442, 'dresses': 0.01990707963705063, 'cruiser': 0.01982446014881134, 'minister': 0.019772669300436974, 'posts': 0.019769981503486633, 'in': 0.019703848287463188, 'converse': 0.019654104486107826, 'comics': 0.019526366144418716, 'curved': 0.019459856674075127, 'filming': 0.01944178342819214, 'transport': 0.019421638920903206, 'graphs': 0.019420573487877846, 'england': 0.019341394305229187, 'finished': 0.019330278038978577, 'israeli': 0.019322780892252922, 'train': 0.01930440217256546, 'estuary': 0.019264454022049904, 'colors': 0.019251134246587753, 'not': 0.019244981929659843, 'clothes': 0.019196324050426483, 'cycling': 0.019192397594451904, 'rock': 0.019139660522341728, 'entrance': 0.019131937995553017, 'they': 0.019118938595056534, 'girls': 0.01909857988357544, 'creates': 0.019001340493559837, 'straight': 0.01896584965288639, 'ramp': 0.018880847841501236, 'yarn': 0.018834875896573067, '##so': 0.018782056868076324, 'en': 0.018770204856991768, 'plenty': 0.018760256469249725, 'cooked': 0.018726006150245667, 'columns': 0.01871844008564949, 'woman': 0.018644992262125015, 'ghosts': 0.018612824380397797, 'women': 0.018572252243757248, 'around': 0.018539708107709885, 'go': 0.01847694255411625, 'motors': 0.01846553198993206, 'fall': 0.0183357335627079, 'shirts': 0.018309278413653374, 'marrying': 0.018295999616384506, 'lights': 0.018228383734822273, 'state': 0.01821323111653328, 'scenic': 0.018161486834287643, 'birth': 0.018102236092090607, 'he': 0.018012026324868202, 'combat': 0.01794506050646305, 'musicians': 0.017915677279233932, '##n': 0.0179031603038311, 'wonder': 0.017847293987870216, 'boxes': 0.0178367979824543, 'year': 0.01778939738869667, 'by': 0.017777521163225174, 'machines': 0.017745764926075935, 'carved': 0.01766805350780487, 'nail': 0.017648814246058464, 'lace': 0.017632918432354927, 'fiber': 0.017598960548639297, 'and': 0.017584890127182007, 'horse': 0.017577601596713066, 'troops': 0.017538417130708694, 'older': 0.017492037266492844, '##paper': 0.017420770600438118, 'kate': 0.017374828457832336, 'post': 0.01735268533229828, 'illustrator': 0.01734430342912674, 'california': 0.017341289669275284, 'neighborhood': 0.017324162647128105, 'name': 0.01731964386999607, 'fire': 0.01724536158144474, 'sword': 0.017175009474158287, 'hit': 0.01716987043619156, 'appeared': 0.01715001091361046, 'disabled': 0.01714109443128109, 'conferences': 0.01712385006248951, 'come': 0.017100051045417786, 'meeting': 0.016997279599308968, 'begun': 0.01699347048997879, 'anything': 0.016835790127515793, 'walls': 0.01680733822286129, 'logo': 0.016802411526441574, 'paulo': 0.016786707565188408, 'decade': 0.01674758829176426, 'cliffs': 0.016719533130526543, 'more': 0.016667956486344337, 'category': 0.016647402197122574, 'workplace': 0.016606232151389122, 'its': 0.016560135409235954, 'late': 0.016551874577999115, 'sunset': 0.016521083191037178, 'nipple': 0.016448084264993668, ',': 0.016411125659942627, 'hills': 0.01641038805246353, 'help': 0.016349075362086296, 'conversation': 0.01633346639573574, 'motorway': 0.016318930312991142, 'arrangement': 0.01628204621374607, 'difficult': 0.01627170853316784, 'people': 0.0162493996322155, 'best': 0.016208361834287643, 'south': 0.016187559813261032, 'poker': 0.016186445951461792, 'word': 0.016147498041391373, 'there': 0.016147473827004433, 'running': 0.0160602405667305, 'agricultural': 0.016014600172638893, 'french': 0.01600603573024273, 'fleet': 0.015939155593514442, 'suite': 0.01591530442237854, 'cattle': 0.015896307304501534, 'gps': 0.015896065160632133, 'driver': 0.01589447446167469, 'i': 0.015847347676753998, 'numbers': 0.01578625850379467, 'trees': 0.015775445848703384, 'festival': 0.015738587826490402, 'lumpur': 0.015727318823337555, 'cars': 0.01570112816989422, 'dress': 0.015655124559998512, 'racks': 0.015614982694387436, 'hot': 0.01557366456836462, 'standing': 0.015523117035627365, 'last': 0.015512230806052685, 'lying': 0.015487135387957096, 'panorama': 0.015470068901777267, 'toys': 0.015445556491613388, 'next': 0.015439731068909168, 'cards': 0.015344187617301941, 'airfield': 0.01531196478754282, 'guests': 0.015307535417377949, 'fallen': 0.015295048244297504, 'flew': 0.015258761122822762, 'computer': 0.01519910991191864, 'your': 0.015143627300858498, 'monaco': 0.015013882890343666, 'sports': 0.015012077055871487, 'animal': 0.01489216648042202, 'posters': 0.014877188950777054, 'snaps': 0.01483413390815258, 'air': 0.014816074632108212, 'pete': 0.014801671728491783, 'writing': 0.014763940125703812, 'b': 0.01475487370043993, '##ding': 0.014738839119672775, 'tokyo': 0.014734970405697823, 'g': 0.014705500565469265, 'disco': 0.014688928611576557, 'dark': 0.014662611298263073, 'curls': 0.014629114419221878, '##8': 0.01455352921038866, 'staff': 0.014529677107930183, 'open': 0.014522654004395008, 'automobiles': 0.014439165592193604, 'court': 0.014437513425946236, 'house': 0.014412741176784039, 'trail': 0.0143731152638793, '##11': 0.014349646866321564, 'if': 0.014349129050970078, 'iceland': 0.014299378730356693, 'stockholm': 0.01429907139390707, 'paris': 0.014272312633693218, 'concrete': 0.014264949597418308, 'chicago': 0.014237318187952042, 'skyscraper': 0.014195841737091541, 'gold': 0.014182980172336102, 'indian': 0.014168139547109604, 'illinois': 0.014132884331047535, 'across': 0.014054491184651852, 'storm': 0.014035490341484547, 'mara': 0.014030407182872295, 'shooting': 0.014002588577568531, 'glass': 0.013989686034619808, 'maintenance': 0.013985292986035347, 'these': 0.01395443081855774, 'coin': 0.01394661981612444, 'our': 0.013929271139204502, 'road': 0.01387818343937397, 'magazines': 0.013851105235517025, 'apartments': 0.013848722912371159, 'cups': 0.013835337944328785, 'one': 0.013802585192024708, 'walk': 0.013742801733314991, 'fishermen': 0.013742516748607159, 'blossoms': 0.01372458878904581, 'crossing': 0.013715928420424461, 'phone': 0.013663948513567448, 'voting': 0.013663209974765778, 'fast': 0.013633794151246548, 'portugal': 0.013577647507190704, 'passport': 0.013566430658102036, 'driveway': 0.013512706384062767, 'sun': 0.013494763523340225, 'scroll': 0.013484204187989235, 'parties': 0.013447999022901058, 'at': 0.013429984450340271, 'feeding': 0.013427806086838245, 'uci': 0.01340564712882042, 'got': 0.013394949026405811, 'tour': 0.013391916640102863, 'organized': 0.013389411382377148, 'broken': 0.013381951488554478, 'm': 0.013370135799050331, 'floats': 0.013362487778067589, 'limestone': 0.013327321968972683, 'date': 0.013306871056556702, 'is': 0.013294368982315063, 'camp': 0.013289464637637138, 'guide': 0.013259547762572765, 'church': 0.013242506422102451, 'show': 0.013233322650194168, 'airport': 0.013166327029466629, 'joined': 0.013114415109157562, 'manner': 0.013114388100802898, 'texture': 0.013088351115584373, 'elephant': 0.013070783577859402, 'flown': 0.013043011538684368, 'philippines': 0.013040580786764622, 'birthday': 0.013001060113310814, 'summit': 0.012994382530450821, 'trains': 0.012967993505299091, 'ins': 0.012925160117447376, 'spray': 0.012889289297163486, 'worn': 0.012853036634624004, 'pageant': 0.01282268762588501, 'flooded': 0.012805831618607044, 'return': 0.012803933583199978, 'golden': 0.012753787450492382, 'fly': 0.012705220840871334, 'stadiums': 0.012701564468443394, 'badge': 0.012677855789661407, 'hillside': 0.012676836922764778, 'patterns': 0.012665028683841228, 'together': 0.01264972798526287, 'north': 0.012649502605199814, 'atoll': 0.012647993862628937, 'chassis': 0.012636643834412098, 'tackle': 0.01263144426047802, 'feminine': 0.012627767398953438, 'plain': 0.012609141878783703, 'graphics': 0.01260832604020834, 'ireland': 0.012594684027135372, 'views': 0.01256586518138647, 'march': 0.012529943138360977, 'roots': 0.012520660646259785, 'you': 0.012519710697233677, 'timber': 0.012516258284449577, 'pipes': 0.012472421862185001, 'furniture': 0.012456361204385757, 'tattoo': 0.01245201751589775, 'homework': 0.012378126382827759, 'babies': 0.01236709300428629, '##g': 0.012359540909528732, 'paper': 0.012358162552118301, 'ed': 0.012350082397460938, 'james': 0.012340719811618328, 'here': 0.012287304736673832, 'wet': 0.012266302481293678, 'brisbane': 0.012248328886926174, 'sea': 0.012238863855600357, 'con': 0.012234183959662914, 'miss': 0.01222695130854845, 'bartender': 0.01218860037624836, 'explosion': 0.012184279970824718, 'brook': 0.01218137051910162, 'class': 0.012178455479443073, 'portraits': 0.012158839963376522, 'mesh': 0.012120704166591167, 'clear': 0.012105664238333702, 'winners': 0.012104860506951809, 'p': 0.012082230299711227, 'album': 0.012032611295580864, 'chateau': 0.012028166092932224, 'international': 0.012027224525809288, 'money': 0.01201920211315155, 'footprints': 0.012010014615952969, '##9': 0.012002901174128056, 'october': 0.011999120935797691, 'glen': 0.011967172846198082, 'zip': 0.011948238126933575, 'circles': 0.01194760762155056, '##pro': 0.011928165331482887, 'headline': 0.011926730163395405, 'commercial': 0.011923984624445438, 'addition': 0.011898109689354897, 'chrome': 0.011884268373250961, 'rubber': 0.01185522973537445, 'sign': 0.011840018443763256, 'climbed': 0.01182550098747015, 'floral': 0.01180923543870449, 'dinner': 0.011761641129851341, 'vietnam': 0.011737983673810959, 'rice': 0.011733711697161198, 'rural': 0.01170607004314661, 'fish': 0.011677242815494537, 'this': 0.011620882898569107, 'reporter': 0.011604774743318558, 'iaaf': 0.01157041173428297, '##e': 0.011552607640624046, 'staring': 0.011543200351297855, 'rise': 0.011507304385304451, 'millennium': 0.01150390226393938, 've': 0.011503015644848347, 'jordan': 0.011502329260110855, 'please': 0.011493702419102192, 'gp': 0.011492498219013214, 'sticks': 0.011492307297885418, 'laptop': 0.011483244597911835, 'sunlight': 0.011480101384222507, 'good': 0.011472394689917564, 'depicts': 0.011447289027273655, 'watching': 0.011431216262280941, 'toyota': 0.011409970931708813, 'flames': 0.011387850157916546, 'germany': 0.011365937069058418, '##ang': 0.011354826390743256, 'line': 0.011354411020874977, 'auckland': 0.01135417353361845, 'holland': 0.011352483183145523, 'paths': 0.011319166049361229, 'exploring': 0.011302805505692959, 'closed': 0.011276411823928356, '##0': 0.011270977556705475, 'glacier': 0.011260641738772392, 'workshops': 0.011236144229769707, 'towns': 0.011205038987100124, 'laboratories': 0.011167679913341999, 'crew': 0.01115912850946188, 'blue': 0.011134626343846321, 'swimmers': 0.01110626570880413, 'david': 0.011102713644504547, 'then': 0.011083239689469337, 'gay': 0.011067390441894531, 'soon': 0.011042667552828789, 'decided': 0.011018353514373302, 'common': 0.01101773977279663, 'mud': 0.010985473170876503, 'hill': 0.010958608239889145, 'river': 0.010937264189124107, '##car': 0.01091019157320261, 'bank': 0.010875090025365353, 'filters': 0.010865611955523491, 'hindu': 0.010846198536455631, 'artifacts': 0.010835027322173119, 'motorsport': 0.010827927850186825, 'memories': 0.010825654491782188, 'path': 0.010797563940286636, 'magical': 0.010796461254358292, 'palm': 0.010796215385198593, 'lives': 0.010794299654662609, '9': 0.010751789435744286, 'buddhist': 0.010666007176041603, 'younger': 0.010665705427527428, 'ballet': 0.010646138340234756, 'stations': 0.010626520961523056, 'bands': 0.010625699535012245, 'parents': 0.010615128092467785, 'entertainment': 0.010599835775792599, 'studying': 0.010560221038758755, 'sweat': 0.010536428540945053, 'h': 0.010534947738051414, 'view': 0.01053050346672535, 'teachers': 0.010515806265175343, 'envelope': 0.010510440915822983, 'police': 0.010485615581274033, 'herd': 0.010481791570782661, 'crystal': 0.01047755591571331, 'officer': 0.010434183292090893, 'april': 0.010416361503303051, 'actress': 0.010415572673082352, 'ran': 0.010409045964479446, 'airbus': 0.010406381450593472, 'formula': 0.010397275909781456, 'marker': 0.010343631729483604, '##2': 0.010336126200854778, 'platforms': 0.0103150624781847, 'mumbai': 0.010282184928655624, 'flow': 0.010280688293278217, 'swedish': 0.01027995627373457, 'racecourse': 0.010268604382872581, 'out': 0.010266498662531376, 'innocent': 0.010259914211928844, 'front': 0.010256759822368622, 'flowers': 0.010244320146739483, 'her': 0.010233558714389801, 'sunny': 0.010215209797024727, 'truck': 0.010204166173934937}
jzhoubu commented 4 months ago

We use encoder_p for image. You may change your function as below:

disentanglement = vdr_cross_modal.encoder_p.dst(image_file, k=768, visual=True) # Generate a word cloud if `visual`=True
print(disentanglement)
Clementine24 commented 4 months ago

Problem solved! Thank you very much for your response.

jzhoubu commented 4 months ago

Great to hear that the problem has been resolved! If you encounter any further problems, feel free to reopen it.