long8v PTIR issues - Githubissues

long8v / PTIR

Paper Today I Read

19 stars 0 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[160] ALOHa: A New Measure for Hallucination in Captioning Models

#179 long8v opened 2 weeks ago
0
[159] Long-CLIP: Unlocking the Long-Text Capability of CLIP

#178 long8v opened 1 month ago
0
[158] A Mathematical Framework for Transformer Circuits

#177 long8v opened 1 month ago
0
240507 add text span

#176 long8v closed 1 month ago
0
feat: add text span

#175 long8v opened 1 month ago
0
feat: add LeGrad

#174 long8v opened 1 month ago
0
[157] LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

#173 long8v opened 1 month ago
1
[156] Interpreting CLIP's Image Representation via Text-Based Decomposition

#172 long8v opened 1 month ago
0
[155] Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

#171 long8v opened 2 months ago
1
feat: llava next hf implementation

#170 long8v opened 2 months ago
0
[154] Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

#169 long8v opened 3 months ago
0
[153] Contrastive Explanations for Model Interpretability

#168 long8v opened 3 months ago
0
🍅 짭짤이 논문 모아보기 (XAI)

#167 long8v closed 1 month ago
8
🍅 짭짤이 논문 모아놓기 (VLM)

#166 long8v closed 1 month ago
1
🍅 짭짤이 논문 모아놓기 (CLIP)

#165 long8v closed 1 month ago
17
[152] Sigmoid Loss for Language Image Pre-Training

#164 long8v opened 3 months ago
0
[151] FOIL it! Find One mismatch between Image and Language caption

#163 long8v opened 4 months ago
0
[150] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

#162 long8v opened 4 months ago
0
[149] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

#161 long8v opened 4 months ago
1
[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

#160 long8v opened 4 months ago
0
[147] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

#159 long8v opened 4 months ago
2
[146] Transformer Interpretability Beyond Attention Visualization

#158 long8v opened 4 months ago
0
[145] CLIPScore: A Reference-free Evaluation Metric for Image Captioning

#157 long8v opened 4 months ago
1
[144] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

#156 long8v opened 6 months ago
0
[143] Honeybee: Locality-enhanced Projector for Multimodal LLM

#155 long8v opened 6 months ago
2
[142] Trust Region Policy Optimization

#154 long8v opened 6 months ago
2
[141] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

#153 long8v opened 6 months ago
0
[140] Improved Baselines with Visual Instruction Tuning

#152 long8v opened 6 months ago
1
[139] Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation

#151 long8v opened 6 months ago
0
[138] ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

#150 long8v opened 6 months ago
0
[137] mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

#149 long8v opened 7 months ago
1
[136] Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models

#148 long8v opened 7 months ago
0
[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

#147 long8v opened 7 months ago
1
[134] Asynchronous Methods for Deep Reinforcement Learning

#146 long8v opened 8 months ago
0
[133] DataComp: In search of the next generation of multimodal datasets

#145 long8v opened 9 months ago
0
[132] Hyperbolic Image-Text Representations

#144 long8v opened 9 months ago
0
[131] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

#143 long8v opened 9 months ago
1
[130] Segment Anything

#142 long8v opened 10 months ago
1
[129] Grounding Language Models to Images for Multimodal Inputs and Outputs

#141 long8v opened 10 months ago
0
[128] Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

#140 long8v opened 10 months ago
0
[127] Linearly Mapping from Image to Text Space

#139 long8v opened 10 months ago
0
[126] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

#138 long8v opened 10 months ago
0
[125] RILS: Masked Visual Reconstruction in Language Semantic Space

#137 long8v opened 11 months ago
0
feat: add sparse rcnn

#136 long8v opened 11 months ago
0
[124] LiT: Zero-Shot Transfer with Locked-image text Tuning

#135 long8v opened 12 months ago
0
[123] Robust fine-tuning of zero-shot models

#134 long8v opened 12 months ago
0
[122] Neural Architecture Search without Training

#133 long8v opened 1 year ago
0
[122]

#132 long8v closed 1 year ago
1
[121] Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

#131 long8v opened 1 year ago
0
feat: add open-clip

#130 long8v opened 1 year ago
0